Sunday, March 03, 2024
生成式 AI 入门教程 1 - 什么是生成式 AI - 了解其定义、应用与影响
生成式AI这一年:从群雄混战到生态确立,世界已被改变
硅谷101
92.6K subscribers
Subscribe
4.8K
Share
Download
Thanks
Clip
309,782 views Jan 3, 2024 PALO ALTO
2022年11月30日,OpenAI的ChatGPT正式上线,从此AI大模型浪潮席卷而来,硅谷创业市场瞬间火爆,风投资本极速转向,二级金融概念股疯狂飙升,科技巨头拉响红色警报,抢夺人类未来的蛋糕争夺战正式开打。从底层大模型,到基础设施,到机器学习操作,再到消费端应用,生成式AI的生态已经初步建立。
这个视频,我们将回顾过去一年硅谷巨头和独角兽们在生成式人工智能方面的发展,与资深从业者一起聊聊大模型如何改变了这个世界,同时也带你探索2024年里AI将如何进一步进化。
你会看到:
00:05-01:17 仅仅一年,AI人工智能技术迅速发展
01:18-07:17 OpenAI领跑:愈发庞大和神秘
07:18-10:32 微软这一年:亦敌亦友
10:33-17:21 谷歌这一年:红色警报下的全力以赴
17:22-20:59 Meta小扎率先开源:发布大模型LLaMA
21:00-22:20 Mistral AI发布开源大模型Mistral 7B
22:21-26:50 上游大赢家:芯片巨头们
26:51-28:50 追不上OpenAI更新的创业者们
28:51-32:13 2024“应用”之年
【关于硅谷101】
我们是由海内外一线媒体记者/主持人创办的栏目,深度解析硅谷创新趋势,以轻松的风格分享科技行业的最新动态。我们采过顶级科技大佬,积累了数万小时的媒体经验,做过调查性报道,操盘过千万级传播量的知名深度稿,引发全国讨论和微博热搜;致力于将最专业的媒体素养和信息搜集能力转化为易传播的新媒体力量。
关注我们,从这里驶向未来。
联系我们:video@sv101.net
【往期精彩内容】
拉斯维加斯百亿网红球Sphere,承载着"纽约恶人"豪门富二代的继承枷锁与救赎
• 拉斯维加斯百亿网红球Sphere,承载着"纽约恶人"豪门富二代的继承枷锁与救赎
代糖致癌?深聊在禁令、政治和金钱之间反复游走的甜蜜革命|生物科技特辑(2)
• 代糖致癌?深聊在禁令、政治和金钱之间反复游走的甜蜜革命|生物科技特辑(2)
为马斯克减重30磅的“司美格鲁肽”:毒蜥蜴,琵琶鱼,41年和数十亿美元|生物科技特辑(1)
• 为马斯克减重30磅的“司美格鲁肽”:毒蜥蜴,琵琶鱼,41年和数十亿美元|生...
Sam Altman回归!聊聊“叛变者”的恐惧与信念:OpenAI技术灵魂人物Ilya Sutskever
• Sam Altman回归!聊聊“叛变者”的恐惧与信念:OpenAI技术灵魂...
“全球市值第三车企”VinFast:越南首富的美国梦,惊叹和荣耀只有16个交易日
• “全球市值第三车企”VinFast:越南首富的美国梦,惊叹和荣耀只有16个交易日
【硬核】在诉讼与骂名中崛起的英伟达GPU:从未停止过战斗,也没有永远的朋友
• 【硬核】在诉讼与骂名中崛起的英伟达GPU:从未停止过战斗,也没有永远的朋友
【深度】聊聊科技四巨头的研发与XR行业的兴衰沉浮
• 苹果Vision Pro来了,聊聊科技四巨头的研发与XR行业的兴衰沉浮
【深度】聊聊AI机器人为什么发展如此缓慢【有彩蛋哟🎉】
• 【深度】聊聊AI机器人为什么发展如此缓慢【有彩蛋哟🎉】
【深度】OpenAI成长史:顶级资本的冲突与科技大佬们的理想主义
• 【深度】OpenAI成长史:顶级资本的冲突与科技大佬们的理想主义
Key moments
View all
硅谷101
92.6K subscribers
Videos
About
182 Comments
rongmaw lin
Add a comment...
@TheValley101
Pinned by 硅谷101
@TheValley101
2 months ago
时间轴如下(感谢各位在2023年对我们的支持,我们也期待在2024年给大家带来更好的科技商业内容🌹):
00:05-01:17 仅仅一年,AI人工智能技术迅速发展
01:18-07:17 OpenAI领跑:愈发庞大和神秘
07:18-10:32 微软这一年:亦敌亦友
10:33-17:21 谷歌这一年:红色警报下的全力以赴
17:22-20:59 Meta小扎率先开源:发布大模型LLaMA
21:00-22:20 Mistral AI发布开源大模型Mistral 7B
22:21-26:50 上游大赢家:芯片巨头们
26:51-28:50 追不上OpenAI更新的创业者们
28:51-32:13 2024“应用”之年
40
Reply
3 replies
@alitpiggy
@alitpiggy
1 month ago
特别棒,期待持续更新!
2
Reply
@xiaodarui
@xiaodarui
1 month ago
视频很赞!从头看到尾
3
Reply
@HardyDimension
@HardyDimension
1 month ago
感谢整理与讲解🙏🙏👍👍
2
Reply
@edwardhuang2475
@edwardhuang2475
1 month ago
非常棒的节目,我喜欢 。。。😊🌹
3
Reply
@michael-tn3nu
@michael-tn3nu
2 months ago
谢谢精彩的讲解
2
Reply
@pinzhizhang1229
@pinzhizhang1229
1 month ago
你们的内容非常棒❤
Reply
@btc6245
@btc6245
2 months ago
这期必须先赞后看
Reply
@user-ev7fd2yp7o
@user-ev7fd2yp7o
2 months ago
太爱如此高质量的科技频道节目
18
硅谷101
Reply
@benjaminyong5225
@benjaminyong5225
1 month ago
YouTube把你推给我好几次了。终于决定点进来看。结果我发现我寻到宝藏了!非常高质量的频道!继续加油!
10
Reply
硅谷101
·
1 reply
@SoarerD
@SoarerD
1 month ago
最喜欢的财经频道😊
2
Reply
@popingwong3348
@popingwong3348
8 days ago
這個節目內容豐富, 講得很清楚明了,連我這普通人都有興趣!
Reply
@nyunyubo
@nyunyubo
2 months ago
感謝整理,讓我對AI進展的過程更寬更廣更有系統❤❤❤
20
硅谷101
Reply
硅谷101
·
1 reply
@williamnie860
@williamnie860
3 days ago
少有的高质量高水准的视频博主!太棒了!意外之喜!
Reply
@awhappy
@awhappy
2 weeks ago
Very useful! thank you so much for producing this! Appreciate it :) !
1
Reply
@HXX5269
@HXX5269
2 months ago (edited)
我最愛的科技頻道❤請繼續更新,謝謝!❤😊😊❤
8
Reply
硅谷101
·
2 replies
@Richard_Qian
@Richard_Qian
8 hours ago
剪辑很棒,文案非常棒。
Reply
@user-ps1kc6uc9q
@user-ps1kc6uc9q
2 weeks ago
高质量的视频,加油!
1
Reply
@froglegs4910
@froglegs4910
1 month ago
Love 💕😘精品专业分享 !
1
Reply
@gohpaul3089
@gohpaul3089
2 months ago
Good morning 🌞
Happy New Year 2024
For Your video posting
Always Thumb up 👍 first , always Right
❤❤❤❤❤❤
2
Reply
@Bill-dl9xr
@Bill-dl9xr
2 months ago
講得太好了
Reply
@ajax.phajax3150
@ajax.phajax3150
1 month ago
SGD 6.98
谢谢!
1
Reply
@kohlaypoo1523
@kohlaypoo1523
2 months ago
美貌与智慧
11
Reply
@Lao-Ke
@Lao-Ke
1 month ago
视频做得非常专业,文稿写得非常好👍👍👍👍
4
Reply
@karenhe4656
@karenhe4656
1 month ago
不错的的视频🎉❤
Reply
@junnanmao5631
@junnanmao5631
2 months ago
非常好
Reply
@qianma853
@qianma853
2 months ago
one of my favorite channels❤
硅谷101
Reply
硅谷101
·
1 reply
@theobobcool8990
@theobobcool8990
1 month ago
2023 年开元大模型元年,有幸参加了大模型项目研发。 也有幸能遇到硅谷101这么专业且纯粹的UP主,新的一年,一起加油🎉
7
Reply
硅谷101
·
1 reply
@jingqiwu2865
@jingqiwu2865
2 months ago
制作精良!
Reply
@reliable-helper
@reliable-helper
2 weeks ago
very good insight and information from insiders of the industry.
Reply
@amhowaye1605
@amhowaye1605
8 days ago
你看这种高质量的视频,评论区一片祥和, 不像乌烟瘴气的垃圾视频,评论区只有无休止的低能争吵。真的棒,希望大家都能静下心来看一看这个除去 政治新闻,娱乐媒体以外的 真正有价值的世界
Reply
@DUODUOflower
@DUODUOflower
3 weeks ago
说得好
Reply
@hedi_dreamland
@hedi_dreamland
2 weeks ago (edited)
人們會逐步 更認識自己是什麼
18:05 這真的很好 這個時候 拋磚引玉 讓更多人投入 去關注 這比什麼都重要
30:49 期待 令人非常期待
1
Reply
@user-jv8rr8ey2k
@user-jv8rr8ey2k
1 month ago
真好
Reply
@loveziyou
@loveziyou
2 months ago
中文科技频道之光😊
Reply
@DULENSHK
@DULENSHK
1 month ago
還記得bitcoin 風潮當起的時候麼?
科技的本質是什麼,是服務人類,一個最終目標是取代人類的科技會如何粉碎人類社會結構?
當人類的根源生產力和創造力被取代的結局會如何?
不過現在那種東西還沒有到AI的感覺。
5
Reply
@user-ve7cd1kd2s
@user-ve7cd1kd2s
2 months ago (edited)
这篇文章整理起来不容易。订阅了
1
Reply
@samuelzm1
@samuelzm1
2 months ago
讲到了玩家、模型和算力的进步,希望也能去看看在算法上的新变化
5
Reply
4 replies
@music1band
@music1band
2 months ago
Happy New Year 🎊
Reply
硅谷101
·
1 reply
@enzohsieh
@enzohsieh
2 months ago
Thanks 🎉🎉🎉
Reply
@allenculbertson8170
@allenculbertson8170
1 month ago
I think i could learn this language and enjoy doing what a beautiful thing
Reply
@elon.s.kenedy
@elon.s.kenedy
2 months ago (edited)
超级喜欢听你说话。 小姐姐很美很耐看,而且讲话听久了也不觉得腻。。好好听。
4
Reply
@Where-the-god2024
@Where-the-god2024
1 month ago
看了您的视频,我觉得关于人工智能的“制造”太多明目了,有点眼花缭乱哈哈,如果您能够以一个大格局的方式,把人工智能与资本流动之间的关系来解释人工智能在资本市场上的历史脉络,也许让人更加明了人工智能背后的资本势力格局了,而且我也非常好奇,那些科技公司是凭着什么“语言规则”来命名它们各自的AI生成技术呢??难道是——随便取名??也许从科技公司命名它们的AI还能够意外解开历史的内幕😂😂
7
Reply
@ShusenWang
@ShusenWang
1 month ago
你们的内容质量好高啊!事实都是正确的,还能采访到ion stoica这样的大佬
5
硅谷101
Reply
硅谷101
·
1 reply
@zoroy1ee
@zoroy1ee
1 month ago
这个视频看了一秒就果断订阅
Reply
@adam888hoh
@adam888hoh
1 month ago
能不能讲讲2023下半年最耀眼的公司们
Reply
@lichun1972
@lichun1972
5 days ago
你太迷人了 爱你
Reply
@lingchih4893
@lingchih4893
1 month ago
優質
Reply
@chenyeshao3887
@chenyeshao3887
1 month ago
请问硅谷101能否出一期节目介绍欧美常见的ai研讨社区,以及一些定期的线下沙龙,多谢多谢
Reply
硅谷101
·
2 replies
@kh-dk8yo
@kh-dk8yo
9 days ago
多謝分享。有可能介紹一些實用的產品嗎?譬如配套一首歌詞,用AI作音樂及視頻? 多謝!
Reply
硅谷101
·
1 reply
@bunnyru33
@bunnyru33
1 month ago
太喜欢了 新的一年希望你们上10w订阅🎉
1
Reply
1 reply
@BC18139
@BC18139
2 weeks ago
我一直关注AI科技相关内容,这么优秀的频道YouTube今天才推给我,必须给YouTube一个差评!
Reply
@yichensitu6043
@yichensitu6043
2 months ago
我们很有幸生活在这个时代见证历史
1
Reply
1 reply
@ambrosia8068
@ambrosia8068
3 weeks ago
感觉现在的chatgpt已经消减了消费端算力调度,让回答变得又短又懒还频繁出错,体验感越来越差
14
Reply
1 reply
@davidduan5246
@davidduan5246
1 month ago
请问使用什么软件制作视频呢?
1
Reply
硅谷101
·
1 reply
@littlebookbook
@littlebookbook
1 month ago
不懂开源大模型开源的是什么意义?大模型最重要的训练数据
Reply
@EliasZh926
@EliasZh926
2 weeks ago
你说英语好好听啊
Reply
@chaojunshen8751
@chaojunshen8751
13 days ago
视频的图像剪辑是不是也用AI了,不然工作量会很大
Reply
@johnchai8546
@johnchai8546
1 month ago
支持陈茜~❤
Reply
@Design-Enjoyment-Happiness
@Design-Enjoyment-Happiness
2 months ago
用過就知道mistral-7b對中文的處理不太行,常常會無意義地重複。目前小模型可能還是遠不如大模型
6
Reply
硅谷101
·
3 replies
@youngjohnson1305
@youngjohnson1305
1 month ago
不管那一個品牌,,,,只有一個生產商.....價格不會降下來
Reply
@changyu100
@changyu100
2 weeks ago (edited)
科研的本质就是在较高维度上了解自然,这也是人类超越其他生物的独门绝技。科技不可怕,就如自然界狼虫虎豹病毒细菌一样,只有在不断竞争博弈中才有了生物进化。人工智能也是一样,人类个体的智力可能很快就要落后于电脑,这或许导致了一些本能的恐慌,但试回想一下人类第一次遇到狼与牛马时的恐慌,以及火车取代马车时的不安。理智可以告诉我们,科技就是那个不断解放人类的生产力,也是社会变革的推动力。人类没有不必要恐惧,要更勇敢滴去拥抱未来。
另外,人类作为社会动物应该学习蚂蚁蜜蜂,每个个体不需太出色,只要各守本分努力参与,群体智慧就自然涌现出来了。
非常棒的节目,感谢博主,大家加油!
Reply
@rundao8627
@rundao8627
3 weeks ago
人美声音美,是真人吗😂
Reply
@rexfosyed3361
@rexfosyed3361
1 month ago (edited)
未来只有ai能打败ai。所以别怕,AI要继续发展下去,而不是阻止。
3
Reply
1 reply
@howdareyouare
@howdareyouare
2 months ago
拥抱文明,学习文明,真正承认先进文明的伟大之处,才能让民族得到世界的认可,而不向狗一样的整天狺狺狂吠:一切责任皆在美那啥,丽那啥,国那啥
4
Reply
@jinguo1225
@jinguo1225
2 months ago
👍👍👍
Reply
@knowkn7993
@knowkn7993
1 month ago
茜好漂酿
Reply
@user-bx5xo3qv5f
@user-bx5xo3qv5f
1 month ago
openAI,到時自己籌錢?
Reply
@sarirowang2283
@sarirowang2283
13 days ago
感觉外边的Ai世界一日一变,墙内的人们望墙兴叹。
Reply
@ssb5171
@ssb5171
2 weeks ago
很豐富的内容 但時間太長了 沒辦法一次看完 熱度會消退 應該分集
Reply
硅谷101
·
1 reply
@huili1920
@huili1920
1 month ago
深度在哪里😂
Reply
@sophontec2822
@sophontec2822
1 month ago
hype
Reply
@qingganjiang447
@qingganjiang447
1 month ago
太先进水平高到现在我觉得自己是在原史人的时代背景只有一个空气质量也不知道何时才能把自己招练成一种学习成绩是可以的,不过我老了没关系,只要能研究成功是在于人头脑简单而又充满希望能让自己也可以有人工智能的观点或立场是让他们都可以象普通人一样,善言而且还可以踏进社会中另有其人之身保护自己的生计问题,这样就是人百态下失去一切都会出现心甘情愿的,发展到出现人类逃态是什么都不必担心自己了,也只能在半空中望着我全身已退休了,时代若后再也不用人类社会主义和不受控制了的一种不跃动力量和也可以说是机器人大战期间也没有百度的可怕,是的路上人餓偏了上半空是神仙,看下这个世界各国都是幻觉而是不一样了,逃态人类历史文化取代了战争时期就是全面升级都可以说是梦犯的祘是科技有限公司吧!不是抢先体验掌控制度后来就見不到一个人了,这样的人生观也没有价值观不同程度是你抢先先进技术是在逃态世界上的所有,社会问题都没有解决,就来个无里头,太可怕了。
1
Reply
@doutu4624
@doutu4624
3 weeks ago
比小lin温柔多了😍
Reply
1 reply
@user-pg6qh7sk5e
@user-pg6qh7sk5e
1 month ago
❤
Reply
@wonderfulcxm
@wonderfulcxm
1 month ago
ai让我变得焦虑,觉得自己做的事情没有意义。可以理解类似围棋高手对alphago的看法。
Reply
@karenhe4656
@karenhe4656
1 month ago
人类将要活在让自己成为没有思想,只有自己的意识幻想时代啦,人类的转折点哈😂😅😢一般欢喜一半忧哈
Reply
@guttmannjone6139
@guttmannjone6139
1 month ago
陈茜真的好美
Reply
@tsunkeonglew7517
@tsunkeonglew7517
12 days ago
It is a report of the reporter.....😢😢😢😢
Reply
@hayama2363
@hayama2363
1 month ago
希望AI能聽懂貓語 狗語
5
Reply
1 reply
@seanchang4442
@seanchang4442
1 month ago
I am curious to know what this super AI model comments on God’s intentions, and his mission on earth.
1
Reply
@joezhou6951
@joezhou6951
1 month ago
非常全面、深入的梳理,但是 AI 幻觉问题目前并没有真正解决。
1
Reply
2 replies
@AiMusicLab89
@AiMusicLab89
1 month ago
桃花庵 - 孫燕姿ai / 唐伯虎(明代) / 原創歌曲(Demo版)
https://youtu.be/hdDLhTZncEw
Reply
@davidtomcai7821
@davidtomcai7821
1 month ago
用AI去訓練AI,歸納模仿數據大腦神經,會加速Al的發展,也許未來一兩年中的某一天你醒來之時,超級Al就已經形成與覺醒。
6
Reply
@thomasyang1565
@thomasyang1565
2 weeks ago
主播好漂亮,都忘了她说什么了, 只顾看容颜忘了内容 !
Reply
@fralawable
@fralawable
10 days ago (edited)
在世界末日倒計時鐘上,人類已經進入了最後不到90秒的時間。
或許就是AI把人類命運交給AU,並把人類文明終結的那一手!
Reply
@colorful-school-life
@colorful-school-life
1 month ago
谢谢你对硅谷社区对AI 行业23年发展的总结和24年的展望👍。有个小请求,就是能否提供一下张潞的联系方式,我们是一家AI教育行业的初创企业,希望有机会可以和她建立项目合作关系🙏
1
Reply
硅谷101
·
10 replies
@dickkcid374
@dickkcid374
3 weeks ago
trophy-yellow-smilingtrophy-yellow-smilingtrophy-yellow-smiling
Reply
@user-pf4jl8eb5w
@user-pf4jl8eb5w
1 day ago
如果地球上出现比人类更聪明更强大的“物种”,它会允许人类继续主宰地球吗?肯定不会,大家自求多福吧
Reply
@kuehcheong6330
@kuehcheong6330
1 month ago
a|什么东西
Reply
1 reply
@user-cr9en2bi2u
@user-cr9en2bi2u
1 month ago
中国的ai也要加油呀!不能总是跟着后面跑吧
Reply
2 replies
@rinoalove48699
@rinoalove48699
1 month ago (edited)
gpt感覺就是一個白吃文科生在問答,一本正經的胡說八道沒有邏輯,而且連基本算術和國中物理問題都算錯
4
Reply
硅谷101
·
1 reply
@bjzh7583
@bjzh7583
3 weeks ago
ai做的再好也就是读者文摘。并不能做雪中送炭的事情。
Reply
@yuyang7400
@yuyang7400
1 month ago
Amen25kkk amen jade amen vimanyu china Drong eyes open sky eyes kids amen
1
Reply
@DULENSHK
@DULENSHK
1 month ago (edited)
這那算AI 都是輔腦一樣的東西 就智能資料庫而已
Reply
@mingthu9922
@mingthu9922
2 weeks ago
开源的都TMD不是好东西
1
Reply
@alexdavid5611
@alexdavid5611
1 month ago
改变了这个世界? 对世界经济贡献为O
Reply
@kaitos111
@kaitos111
2 weeks ago
无视中国在这聊半天,也不知道是在聊啥
Reply
@user-hn8ob3li9h
@user-hn8ob3li9h
2 weeks ago
对时政感兴趣就一直推时政栏目,人都傻了。🈶这类视频洗洗脑子,太好了
Reply
@zhangzhongxing4543
@zhangzhongxing4543
2 weeks ago
google and apple are dying.
Reply
1 reply
@user-cx8ym1cw2o
@user-cx8ym1cw2o
3 weeks ago
科幻騙局😂😂😂
Reply
@ericzhou5249
@ericzhou5249
2 weeks ago
我以为是无脑美女视频,没想到是美女大神
Reply
@user-vo8qd9hg6i
@user-vo8qd9hg6i
2 months ago
按照目前这种完全不计后果的竞争态势,到达通用人工智能,应该不会超过2年。
20
Reply
9 replies
@user-cz7jk1gy1p
@user-cz7jk1gy1p
1 month ago
你們還真相信了 ?!……
真正的智慧,是會問問題,而不是解題 !!
否則最會考試的中韓印,早已統治地球。
AI 用途,不比計算機鍵盤好多少,僅是
個方便工具……
1
Reply
@deny-yk9tc
@deny-yk9tc
1 month ago
小朋友,别天真地发大春梦,上帝无你地痴傻地想象咁无能!况宜,强于人的智慧现实时,就系人类寂静日
1
Reply
@updateupdate3665
@updateupdate3665
1 month ago
这个女主播漂亮的令人发指
1
Reply
@jianfengchen6256
@jianfengchen6256
1 month ago
言过其实,AI 凉了
3
Reply
@updateupdate3665
@updateupdate3665
1 month ago
希望主播下次露腿
生成式 AI 入门教程 1 - 什么是生成式 AI - 了解其定义、应用与影响
宝玉的技术分享
4.55K subscribers
Subscribe
45
Share
Download
Clip
Save
2,008 views Nov 1, 2023 面向所有人的生成式 AI 入门课程
生成式 AI 入门教程 1 - 什么是生成式 AI - 了解其定义、应用与影响
视频描述:
欢迎来到“生成式 AI 入门教程”。自 ChatGPT 发布以来,生成式 AI 引起了广泛关注,它改变了我们的学习和工作方式,并且被认为将大幅提升生产力和推动全球经济增长。但同时,它也带来了一些负面影响的担忧,比如可能导致失业。在本课程中,你将学到生成式 AI 是什么,它能做什么,不能做什么,以及如何在工作或业务中应用它。
生成式 AI 是一种能够生成高质量内容的 AI 系统,包括文本、图像和音频。目前最有影响力的是文本生成,但它也能生成美丽或逼真的图像,甚至是音频。通过这门课,你将了解到生成式 AI 如何使构建 AI 应用变得更容易和低成本,以及如何识别和探索对特定业务有用的应用。
课程分为三周,第一周深入了解生成式 AI 的工作原理和应用案例;第二周讨论 Generative AI 项目,包括寻找应用和构建最佳实践;第三周从宏观角度探讨其对商业和社会的影响,并分享如何最大限度发挥其潜力的策略,同时确保使用 AI 的方式既安全又正面。
🚀 课程亮点:
提供非技术性的生成式 AI 知识,易于理解。
深入探讨生成式 AI 在文本、图像和音频生成方面的应用。
分析生成式 AI 对个人、业务和社会的潜在影响。
🔥 你将学到:
生成式 AI 的定义和能力
如何在工作或业务中应用生成式 AI
生成式 AI 的潜在风险和如何安全使用
🌐 课程内容:
生成式 AI 的工作原理和应用案例
Generative AI 项目的寻找和构建
生成式 AI 对商业和社会的影响
📚 学习建议:
保持开放的心态,积极探索生成式 AI 的潜力。
关注课程中提到的实际案例,思考如何将知识应用到自己的工作中。
注意理解生成式 AI 的潜在风险,并学习如何安全使用。
🔗 相关链接:
课程地址:https://www.coursera.org/learn/genera...
Featured playlist
10 videos
面向所有人的生成式 AI 入门课程
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
宝玉的技术分享
4.55K subscribers
Videos
About
3 Comments
rongmaw lin
Add a comment...
@edisontang5907
@edisontang5907
3 months ago
谢🎉推特来的,受益良多
1
宝玉的技术分享
Reply
@ansonchan7289
@ansonchan7289
3 months ago
谢谢分享,建议可以使用独立的字幕文件,对插件比较友好一些, 另外就是字幕可以增加一个半透明的背景,看起来有点费眼睛(如果是系统字幕就自带了)
0:32 / 29:28
【生成式AI導論 2024】第1講:生成式AI是什麼?
Hung-yi Lee
196K subscribers
Subscribe
1.2K
Share
Download
Clip
Save
38,527 views Feb 23, 2024
課程投影片:https://speech.ee.ntu.edu.tw/~hylee/g...
5:00 在第0講 (
• 【生成式AI導論 2024】第0講:課程說明 (17:15 有芙莉蓮雷) ) 的開頭是以生成式AI 來做分類問題,這顯示如果機器可以從無限的可能中找出正確的選項,從有限的選項中產生答案更是沒有問題
9:20 在本次課程中舉例說明機器學習時,為了方便讓同學理解,講的都是督導式學習 (Supervised Learning),其實機器學習有很多方法並不是提供給機器輸入和輸出的關係,例如:增強式學習 (Reinforcement Learning, RL)
21:10 此處是以中文的「字」作為文字接龍的單位,事實上不同語言模型用來做文字接龍的單位都不相同
延伸閱讀:80分鐘快速了解大型語言模型 (
• 80分鐘快速了解大型語言模型 (5:30 有咒術迴戰雷) )
2015 年《機器學習及其深層與結構化》課程網頁:https://speech.ee.ntu.edu.tw/~tlkagk/...
當日上課教室麥克風有點問題,所以影片中的聲音有時候會忽大忽小,還請見諒
Key moments
View all
Transcript
Follow along using the transcript.
Show transcript
Hung-yi Lee
196K subscribers
Videos
About
63 Comments
rongmaw lin
Add a comment...
@sailize
@sailize
9 days ago
太讚了,感謝老師無私分享一流的教學資源
40
Hung-yi Lee
Reply
@ching-yichen2858
@ching-yichen2858
9 days ago
老師是華文世界最重要也是教學最豐富精彩的AI教育推動者
23
Hung-yi Lee
Reply
@ziligao7594
@ziligao7594
7 days ago
华语AI教育第一人!了不起!传道授业解惑!
18
Reply
@mengyuge3369
@mengyuge3369
4 days ago
李老师,看到您更新视频我真的太高兴了,在学校的时候刷完了您的课程,我去年毕业了,但是看到您开课了我还是会来全部听完
4
Reply
@janchangchou777
@janchangchou777
6 days ago
任何一條神經網路也是一個多變數廻歸分析,也是統計學迴歸分析的一環。我在40年前攻讀數理統計就涉及這些人工智能及多變量(含時間變量)的廻歸分析(向量/ 矩陣/ 線性代數/ 機率/ 取様….)。以便對一些事件做出精准智能的預測。所謂自我學習也只是用後面收集或模擬出的更新的資料去修正原先迥歸分析的參數而已。40 年前人工智慧就是數理統計的一大課題。馬斯克說得一點都沒錯-目前的人工智慧全部建立在數理統計的基礎上。從那時開始就不斷有各行各業的數據分析專業人士來我們數理統計這參與並學習迥歸分析。他們回去後就不斷建立了屬於他們行業內的多條神經網絡(行業內的迥歸模型)。在那時從事這類研究工作的數理統計的博士生全部被限制在晚上12 時過後才能使用國家級的超級計算機,否則我們兩三𠆤人一上線全部就大當機。我們那時也發展出一套類似挖礦機的方式,利用所有大量閒置的𠆤人電腦來提供其微小的算力,進而整合這些龐大的所謂分散型算力,但受限那時網路的不發達,很難把規模擴大。
近幾十年隨計算機能力不斷提升,目前市面AI 所謂大模型,主要是著力於面對”服務大衆需要”的所謂生成式/ 語言等等的智能協作服務。就是把百行百業各個領域等等數以千萬千億計資料進行迥歸模型的建立及修正(所謂自我深度學習)而形成龐大的神經網絡。因此也不用太誇大眼下的AI , 這些理論早在40 年以前都已建構了理論基礎,而智能恊作早在各專業領域都已發展的非常完善,只是因過去算力不足只能在各自專業領域進行中小規模(變量數較少)的神經網絡建構。例如氣象預報就是早期最明顯的利用氣象專用超高速大電腦發展為成熟預測能力(AI)的例子,股票買賣決策也是智能恊作(AI/CIC)的典型。”把簡單數學上使用即存的規模資料或電腦模擬資料進行所謂的㢠歸分析/模型建構並藉此模型做可行的智能預判或恊作,包裝成醫學上複雜尚未完全掌握的神經網路的機制及作用原理”,不但瓢竊了數理統計在AI 發展的絕對地位,實在也是在誤導整𠆤AI 的發展。也會造成眼下一般人的過度期待和焦慮。應將AI 改稱作” CIC:Computer Intelligent Collaboration , 電腦智能恊作, 更為恰當。
另外, 眼下AI 服務非專業大衆的大模型的各種數學理論及所謂的機器學習(參數修正)及深度學習(參數及變數的多層次增加及修正)。 早在40 年前相應的數學理論都已完備(但落實到實際應用上,如何解1 億 by 1 億的聯立方程組, 這需要極其龐大的平行計算能力,在那時期是完全不可能的) 。
其實AI 最重要最關鍵的是各行各業各領域的專家組,而不是這些AI 搞編程的公司( 他們只是依需求用已完善的數學統計理論加以電腦編程後,利用巨大算力去幫忙找出合適的模型並不斷予以完善)。
只有各行各業各領域的專家組才知道在茫茫大海中的資料及訊息中,那些因素才是其所涉領域的関鍵變數,那些變數資料才是可做為他們收集分析建立模型的。例如氣象學/經濟學/股票買賣智能決策/ 醫學/ 藥學/ 農業生產/ 基因工程/ 化學工程/自動駕駛/飛彈防空系統/圖像識別及處理/ 建築結構力學/小樣品模擬模型(核爆/飛機失事)………..等等。
此外服務大衆的語言學也是極度複雜的一門學課,其可能的變量變因會高達幾千萬𠆤, 再加上多層級過濾學習修正的模式,因此其涉及的變數算力就以億計, 所以才稱做大模型。 要取那些因素進行那一層分析,主要都是語言學家在主導。
而這些眼下的AI 應用的公司, 只是利用已發展完備的數學統計理論在不同領域專家組的指導下,去有效的進行數拈收集整理分析並進而建立一個特定領域的模型,作為該特定領域的電腦智能恊作工具。
另外面對服務大衆的生成式服務,也是要面對大衆各種不同的需求,因此其所需處理消化的資料也是天文數字的龐大,也要各行各業領域專家介入協助指導進而形成並建立大模型。
其實生成式AI 可以理解成升級版的超級搜索引擎,傳統的搜索引擎,用関鍵字/詞,從數據庫內匹配/找出並羅列出可能所需資訊,現在進步升級到如果你給出更具體規範的需求,系統就能從數據庫內拼湊出並提供更完整的接近最終需求的服務內容。這只是把過往已存在的數據庫(已長年經各行業領域專家組維護並梳理過的)更完善的整理優化後予以呈現。而這𠆤更完善的過程使用了大量多層次的統計數字分析的手段, 把這個完善化的過程誇大的比擬成人類的思考及智慧(其誇大的目的-圈錢),將造成極大的誤導。
其實生成式大模型, 就是用即存服務於大衆的大型搜索的資料庫,如google , Bing 等等,以數理統計為核心,結合資訊工程及硬體工程為工具,而進行更貼切於使用者需求的優化過程和結果。所以生成式大模型最終會回到過往提供搜索服務的大型公司的對決。
因此CIC 或是AI 是以特定領域專家組為主導,數理統計為核心,資訊工程及硬體工程都是配合的工具而已。 這樣的發展才會健康/ 不浪費/ 高效率/ 高精確度。
但目前的發展方式, 以資訊工程及硬體工程來主導開發服務大衆的大模型,只是因為這方面天文級別的龐大算力需大資本投入,這是一𠆤比較理想的快速盈利回報的營運方式,但這種情況就會造成眼下嚴重的誤導及錯誤的認知,沒有效率及喪失精準度,甚至如當下出現諸多以提供算力及編程服務的所謂AI 公司出面圈錢的亂象。
未來可能的發展模式:
1) 資訊及硬體工程提供集中算力設備及模型編程的開放平台,供各領域的專家組使用。
有點像當下晶片產業,各應用領域產業由專家組組成公司後,進行各領域的智能開發和應用(如晶片應用的design house,聯發科,海思等 ) , 而算力的提供及收費則由資訊及硬體工程提供(這需要密集资本投入,甚至國家力量投入,如台積電)。
2) 由於網路的高度發展, 另外一種提供龐大算力的方式,是由巨量萬家萬戶閒置PC 𠆤人電腦參與分散型算力的提供,並予以整合,這也是需由資訊及硬體工程來實現的。
6
Reply
1 reply
@dipper4684
@dipper4684
5 days ago
謝謝老師讓學店的我也享受一流教育,我想有教無類僅只於此
2
Reply
@LinBond
@LinBond
8 days ago
感謝老師, 讓業界想學習的人有好的來源!
6
Reply
@xiaoyanlu8268
@xiaoyanlu8268
6 days ago
非常感谢老师的无私付出,谢谢您和您的团队制作高质量的人工智能的课程
3
Reply
@user-fk7eo3hm5e
@user-fk7eo3hm5e
8 days ago
感謝老師的課程🙏🏻
3
Reply
@zengjixiang
@zengjixiang
9 days ago
期待下一堂課!
2
Hung-yi Lee
Reply
@BQ_Nya
@BQ_Nya
9 days ago
好期待下一堂課
3
Hung-yi Lee
Reply
@matrisys
@matrisys
8 days ago
等好久,教授又開課了,真開心.
4
Hung-yi Lee
Reply
@amiru6
@amiru6
8 days ago
謝謝老師
Reply
@LM-yh4ys
@LM-yh4ys
6 days ago
感謝老師分享
Reply
@TheZxdfgcv
@TheZxdfgcv
7 days ago
感謝老師
Reply
@sawakun
@sawakun
8 days ago
感谢🙏
Reply
@chanyuan-cv4op
@chanyuan-cv4op
7 days ago
感谢老师,我是你的催更人face-blue-smiling
Reply
@user-ev1zs5rk1m
@user-ev1zs5rk1m
5 days ago
感谢李老师!!
1
Reply
@SyuAsyou
@SyuAsyou
4 days ago
超讚的課程!期待下一講的內容!
Reply
@jdyu1987
@jdyu1987
9 days ago
好棒
謝謝老師❤
Hung-yi Lee
Reply
@shanshan20082009
@shanshan20082009
9 days ago
期待
Hung-yi Lee
Reply
@user-cf4vq6ck9x
@user-cf4vq6ck9x
3 days ago
受教了,感谢老师
Reply
@kunhongyu5053
@kunhongyu5053
8 days ago
终于开了,等了好久face-purple-crying
Reply
@xaviertsai4063
@xaviertsai4063
6 days ago
上課啦~
Reply
@morikuniminamoto8573
@morikuniminamoto8573
3 days ago
文字接龍挺有意思的 我之前從沒思考過這樣的問題
Reply
@benkuo9589
@benkuo9589
16 hours ago
謝謝老師,但是縫隙的聯想是今年學測題目😅
指考也被改成分科⋯⋯
1
Reply
@uartimcs
@uartimcs
9 days ago
yo. 原來有新課程hand-pink-waving
Reply
@elvis1322
@elvis1322
9 days ago
期待老師的另外兩學分比較技術的課程
1
Reply
@user-lt4zd9zj2h
@user-lt4zd9zj2h
8 days ago
多一点李老师这样化繁为简的老师多好,少一点骗麻瓜的内容科技进步才会更快
7
Reply
@user-or9po1yy5w
@user-or9po1yy5w
8 days ago
从第一节课开始跟
Reply
@kiracao5825
@kiracao5825
5 days ago
老师,一直追踪你的课程,能不能也开凸优化和随机过程这两门课,万分感谢
Reply
@stan-kk2sf
@stan-kk2sf
8 days ago
老师,想听世界模型,什么是世界模型,世界模型和生成式AI的关系
这个课程会讲到吗?
Reply
@mixshare
@mixshare
7 days ago
🎉🎉🎉
Reply
@palapapa0201
@palapapa0201
8 days ago
這是可以免費看的嗎
3
Reply
@CornuDev
@CornuDev
9 days ago
❤
Hung-yi Lee
Reply
@mingminglearntech
@mingminglearntech
7 days ago (edited)
我就問這是免費能看的嗎?QAQ 感謝老師無私的分享,還有超強的生活情境案例,讓原本艱深的用字可以讓非本科生理解,身為小學老師透過了解背景才不會單純的停留在用,尤其是像我們必須要跟學生說明使用AI需要注意的地方時,在應用於教學現場時更能知道如何拿捏跟引導學生正確的方式。By從0講開始入坑的女子。
1
Reply
@user-qk6zr2li5q
@user-qk6zr2li5q
4 days ago
请问哪里可以看到作业:真假难辨的世界?
Reply
@user-jq2ux1bj5b
@user-jq2ux1bj5b
8 days ago
1:17 電腦要會選土豆出處:
https://youtu.be/-W3pnicVgn0?si=5ISQ83pNvgXg8vOa
2
Reply
@shaco6034
@shaco6034
2 days ago
请问我们有办法找到homework在哪里嘛?
Reply
@xygen9527
@xygen9527
1 day ago (edited)
3:11 應是學測
Reply
@willy7703
@willy7703
8 days ago
這是現場嗎? 禮拜六上課喔@@!?
Reply
@wuhaipeng
@wuhaipeng
7 days ago
Machine learning is a process that changes the machine state from non-intelligent to intelligent.
Reply
@skyhong2002
@skyhong2002
9 days ago
頭香!
1
Hung-yi Lee
Reply
@user-ud4um6wn7n
@user-ud4um6wn7n
8 days ago
天,您沒提,我都忘了他還沒下船...
Reply
@jacksonzhang6764
@jacksonzhang6764
5 days ago
請問老師在之後的課程中會講到diffusion模型嗎?
Reply
Hung-yi Lee
·
1 reply
@takeshi9458
@takeshi9458
9 days ago
第三香🎉
Hung-yi Lee
Reply
@user-tm2uq6ce4h
@user-tm2uq6ce4h
8 days ago (edited)
要学3个月啊😂
Reply
@moliwu1359
@moliwu1359
7 days ago
這門課是一次上兩堂吧,怎麼只有上傳一堂的內容😢
Reply
@v6854
@v6854
7 days ago
Tarek 看了嗎?
Reply
@yin8421
@yin8421
1 day ago
我所知道的類神經網路最小單元就是個感知機,就像是生物的腦細胞一樣,給予足夠的信號才會向下傳遞訊息。我的認知錯了嗎?最早就是模仿生物大腦的運作原理,而想出來的,不是嗎?
Reply
@deletemadog
@deletemadog
7 days ago
什麼時候 "函數" 變成 "函式" 了?
Reply
3 replies
@miku3920
@miku3920
8 days ago
作業好有趣,可惜不是台大學生做不了
Reply
1 reply
@xiaoyanlu8268
@xiaoyanlu8268
6 days ago
非常感谢老师的无私付出,谢谢您和您的团队制作高质量的人工智能的课程
Reply
@chsh320110108
@chsh320110108
8 days ago
感謝老師
Reply
@Terry0319
@Terry0319
9 days ago
謝謝老師
Hung-yi Lee
Reply
@wayne500000w
@wayne500000w
9 days ago
期待下一堂課!
Hung-yi Lee
【生成式AI導論 2024】第0講:課程說明 (17:15 有芙莉蓮雷)
Hung-yi Lee
196K subscribers
Subscribe
2K
Share
Download
Clip
Save
65,837 views Feb 23, 2024
課程網頁:https://speech.ee.ntu.edu.tw/~hylee/g...
課程投影片:https://speech.ee.ntu.edu.tw/~hylee/g...
2019 年《機器學習》上課提到 GPT-2 的片段:
• ELMO, BERT, GPT
Transcript
Follow along using the transcript.
Show transcript
Hung-yi Lee
196K subscribers
Videos
About
76 Comments
rongmaw lin
Add a comment...
@willcheng8257
@willcheng8257
9 days ago
葬送的芙莉蓮比喻好好笑,老師太有才了
90
Hung-yi Lee
Reply
@droidcrackye5238
@droidcrackye5238
7 days ago
李老师真的是与时俱进,不忘更新新番🤣
23
Reply
1 reply
@user-ud4um6wn7n
@user-ud4um6wn7n
8 days ago
謝謝老師給非相關科系的人了解 AI 的機會。
18
Reply
@jerryperng
@jerryperng
8 days ago
2024課程很期待。謝謝李老師,我的AI之路就是由老師的視頻開始。
13
Hung-yi Lee
Reply
@YuZui0715
@YuZui0715
9 days ago
期待!
邊吃飯邊看 看到芙莉蓮直接噴出來畫面真的超符合
9
Hung-yi Lee
Reply
@ericchen6313
@ericchen6313
8 days ago
老師好貼心,都有做防雷警告
7
Reply
@snowman6246
@snowman6246
9 days ago
太開心了 老師 真是期待已久阿
Reply
@elvis1322
@elvis1322
9 days ago
期待已久!
5
Hung-yi Lee
Reply
@kyle8886
@kyle8886
7 days ago
終於又連載了!
3
Reply
@zengjixiang
@zengjixiang
9 days ago
期待!
Hung-yi Lee
Reply
@takeshi9458
@takeshi9458
9 days ago
期待已久🎉
1
Hung-yi Lee
Reply
@IthaiT
@IthaiT
7 days ago
期待期待😍
Reply
@user-io9yx3jx3s
@user-io9yx3jx3s
5 days ago
感謝老師讓畢業許久的人也能找到探索AI的敲門磚
Reply
@sd016808
@sd016808
9 days ago
期待!!!!!
Reply
@liangyu3771
@liangyu3771
5 days ago
感謝芙莉蓮帶我來到這
1
Reply
@user-hf6mx4qu3l
@user-hf6mx4qu3l
5 days ago
期待期待
Reply
@novis1177
@novis1177
9 days ago
上課啦!
Reply
@patrickruan2290
@patrickruan2290
9 days ago
老師、新年快樂~下次我們跟芙莉蓮大人第三次征魔~
1
Reply
@robert11937apple
@robert11937apple
7 days ago
謝謝老師,太生動了XDDDDDDDDD
1
Reply
@user-su8eu3fk5s
@user-su8eu3fk5s
7 days ago
在北京的清华大学宿舍里听李老师上课!🤩
9
Reply
1 reply
@user-wm3hw6jy5l
@user-wm3hw6jy5l
6 days ago
幾年前大學新訓課程
老師其中一份作業就是看李教授的影片
並撰寫心得
Reply
@esther5435
@esther5435
5 days ago
推芙莉蓮😊
1
Reply
@wayne500000w
@wayne500000w
9 days ago
上課囉!!
Hung-yi Lee
Reply
@ob1234
@ob1234
9 days ago
看來這就是我的跨校選修了
6
Reply
@chsh320110108
@chsh320110108
8 days ago
上課囉 😂😂😂
Reply
@lililichunwei
@lililichunwei
6 days ago
芙利連的比喻真的太生動了
2
Reply
@jason9286
@jason9286
8 days ago
老師這標題很懂演算法喔
6
Reply
@user-nn8vb4gp4f
@user-nn8vb4gp4f
8 days ago
老師也看芙莉蓮 😆😆😆
Reply
@hudsonvan4322
@hudsonvan4322
5 days ago (edited)
先收藏 感謝老師願意使用本公司的產品
Reply
@yuan0
@yuan0
6 days ago
訂閱了
Reply
@YetEthanOnly
@YetEthanOnly
7 days ago
十九萬訂閱了😂
Reply
@user-er6ke3mm4i
@user-er6ke3mm4i
8 days ago
首頁刷到
Reply
@123yogurt7
@123yogurt7
8 days ago
朝聖授課人數達2000人的神課
Reply
@user-mp5ku6ic4k
@user-mp5ku6ic4k
7 days ago
希望能補上字幕 ~~ 謝啦
Reply
@BQ_Nya
@BQ_Nya
9 days ago
來囉!
Reply
@hungcheng9085
@hungcheng9085
8 days ago
簽到
Reply
@CornuDev
@CornuDev
9 days ago
上課GO
Hung-yi Lee
Reply
@MrRtyu29
@MrRtyu29
4 days ago
2:23 我理解老師對於開場的幽默方式,不過使用略微貶意的方式,對於助手的交互形式做說明,可能有些不妥
Reply
@dramajpn
@dramajpn
8 days ago
作業分數也是 AI 計算嗎?
會不會亂算啊face-blue-smiling
Reply
@founder-13th
@founder-13th
9 days ago
优秀的AI导师会带你看动画片
9
Reply
1 reply
@Bhllllll
@Bhllllll
9 days ago (edited)
牛逼
Hung-yi Lee
Reply
@cheinhsinliu
@cheinhsinliu
9 days ago
🎉
Hung-yi Lee
Reply
@user-hf1nh1eo3k
@user-hf1nh1eo3k
6 days ago
老師感覺可以去兼職講評動畫(喜
2
Reply
@capybaRoger
@capybaRoger
6 days ago
如果生成式AI是魔法的話 那Prompt Engineering 就是詠唱咒語了(像無職轉生那樣) 希望老師有空也可以分享 關於如何有效的adjust prompting 來施展想要的魔法
Reply
@v6854
@v6854
9 days ago
Tarek 你什麼時候來看?
Reply
@user-oo9ui5rc2f
@user-oo9ui5rc2f
6 hours ago
被芙莉蓮吸進來惹😂
Reply
@nvsrf
@nvsrf
9 days ago
上課
Hung-yi Lee
Reply
@aa38aa38
@aa38aa38
4 days ago
請問是到第幾集的雷?
Reply
@Jack-sk9hy
@Jack-sk9hy
7 days ago
我占个沙发
Reply
@princend1584
@princend1584
2 hours ago
為了理解老師上課的內容
我一天內看完 葬送的芙莉蓮 1~25集
請問我這樣算是好學生嗎?
Reply
@dogppatrick
@dogppatrick
6 days ago
肥宅工程師: 懂了
2
Reply
@user-mn6fy9lf1q
@user-mn6fy9lf1q
8 days ago
老師居然也有看
Reply
@yuyangdong6911
@yuyangdong6911
9 days ago
上课了上课了
Hung-yi Lee
Reply
@mixshare
@mixshare
7 days ago
🏁
Reply
@user-ul9yc2oj4q
@user-ul9yc2oj4q
9 days ago
第一,嘻嘻
Hung-yi Lee
Reply
@ErenNew787
@ErenNew787
4 days ago
我是来上课的吗?不,我是来看番的😂
Reply
@looprand3965
@looprand3965
9 days ago
請問會介紹 Sora 原理麼?
1
Reply
1 reply
@reebok122479
@reebok122479
9 days ago
進來看福利連+1
Reply
@rexclq
@rexclq
9 days ago
glasses-purple-yellow-diamond
Hung-yi Lee
Reply
@Jack-sk9hy
@Jack-sk9hy
7 days ago
上课 有点感觉像在追动漫
1
Reply
@kenlee7469
@kenlee7469
6 days ago
看起來老師也是動漫控😂
Reply
1 reply
@FatSquirrel
@FatSquirrel
6 days ago
什麼?有芙莉蓮雷?那等我看完芙莉蓮再回來這邊好了 🤣🤣
Reply
1 reply
@qaqpiano1298
@qaqpiano1298
8 days ago
6
Reply
@clusslin
@clusslin
8 days ago
AI能夠生成 李宏毅級棒 呆金嗎?
1
Reply
1 reply
@ccs101a01
@ccs101a01
7 days ago
被封面騙進來wwwww
Reply
@sui-chan.wa.kyou.mo.chiisai
@sui-chan.wa.kyou.mo.chiisai
9 days ago
被芙莉蓮騙進來
Reply
@user-hr6cg8lc6d
@user-hr6cg8lc6d
11 minutes ago
芙莉莲在哪?芙莉莲在哪?🤤
Reply
@miku3920
@miku3920
9 days ago
上課啦!
Reply
@benlee5042
@benlee5042
9 days ago
finger-red-number-onefinger-red-number-one
Hung-yi Lee
生成式AI学习12——生成式人工智能工作室介绍
宝玉的技术分享
4.55K subscribers
Subscribe
3
Share
Download
Clip
Save
174 views Jun 27, 2023 生成式AI学习
12. Introduction to Generative AI Studio 生成式人工智能工作室介绍
本课程介绍了顶点人工智能的产品--生成式人工智能工作室,它可以帮助你对生成式人工智能模型进行原型设计和定制,这样你就可以在你的应用程序中使用其功能。在本课程中,您将学习什么是生成性人工智能工作室,它的功能和选项,以及如何通过产品的演示来使用它。
课程地址: https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
宝玉的技术分享
4.55K subscribers
Videos
About
0 Comments
rongmaw lin
Add a comment...
生成式AI学习1——生成式人工智能简介
宝玉的技术分享
4.55K subscribers
Subscribed
62
Share
Download
Clip
Save
2,832 views Jun 4, 2023 生成式AI学习
这是Google发布的一个入门级的AI课程,解释什么是生成性AI、如何使用生成性AI,以及生成性AI与传统机器学习方法的区别。
https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
宝玉的技术分享
4.55K subscribers
Videos
About
3 Comments
rongmaw lin
Add a comment...
@user-mb7bz2vs7u
@user-mb7bz2vs7u
8 months ago
谢谢宝玉老师。
宝玉的技术分享
Reply
@hayeszhang9827
@hayeszhang9827
8 months ago
感谢宝玉老师推荐这么好的课程(虽然谷歌在最后疯狂打广告哈哈哈)!翻译得太清晰了!
宝玉的技术分享
Reply
@bruce8787
@bruce8787
8 months ago
感谢分享,已订阅
生成式AI学习2——大语言模型介绍
宝玉的技术分享
4.55K subscribers
Subscribed
4
Share
Download
Clip
Save
730 views Jun 5, 2023 生成式AI学习
这是Google的一个AI入门课程,介绍了什么是大型语言模型(LLM),如何利用好大语言模型,以及如何使用提示(Prompt)和微调来优化LLM的性能。
https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
Transcript
0:00
hello and welcome to introduction to
0:03
large language models my name is John
0:05
Ewald and I'm a training developer here
0:06
at Google Cloud
0:08
in this course you learn to Define large
0:10
language models or llms describe llm use
0:13
cases explain prompt tuning and describe
0:17
Google's gen AI development tools
0:20
large language models or llms are a
0:23
subset of deep learning to find out more
0:25
about deep learning see our introduction
0:27
to generative AI course video
0:29
llms and generative AI intersect and
0:32
they are both a part of deep learning
0:35
another area of AI you may be hearing a
0:37
lot about is generative AI
0:39
this is a type of artificial
0:41
intelligence that can produce new
0:42
content including text images audio and
0:45
synthetic data
0:48
so what are large language models large
0:51
language models refer to large general
0:53
purpose language models that can be
0:55
pre-trained and then fine-tuned for
0:57
specific purposes
0:59
what do pre-trained and fine-tuned mean
1:02
imagine training a dog often you train
1:05
your dog basic commands such as sit come
1:08
down and stay
1:12
these commands are normally sufficient
1:14
for everyday life and help your dog
1:16
become a good canine citizen
1:19
however if you need a special service
1:21
dog such as a police dog a guide dog or
1:24
a hunting dog you add special trainings
1:28
the similar idea applies to large
1:30
language models
1:33
these models are trained for general
1:34
purposes to solve common language
1:36
problems such as text classification
1:38
question answering document
1:41
summarization and text generation across
1:44
Industries
1:46
the models can then be tailored to solve
1:49
specific problems in different fields
1:50
such as Retail Finance and entertainment
1:53
using a relatively small size of field
1:56
data sets
1:58
let's further break down the concept
2:00
into three major features of large
2:03
language models
2:04
large indicates two meetings first is
2:07
the enormous size of the training data
2:09
set sometimes at the petabyte scale
2:12
second it refers to the parameter count
2:15
in ml parameters are often called hyper
2:17
parameters
2:18
parameters are basically the memories
2:20
and the knowledge that the machine
2:22
learned from the model training
2:24
parameters Define the skill of a model
2:26
in solving a problem such as predicting
2:28
text
2:29
general purpose means that the models
2:31
are sufficient to solve common problems
2:34
two reasons lead to this idea
2:36
first is the commonality of a human
2:38
language regardless of the specific
2:40
tasks and second is the resource
2:43
restriction
2:44
only certain organizations have the
2:46
capability to train such large language
2:49
models with huge data sets and a
2:51
tremendous number of parameters how
2:54
about letting them create fundamental
2:56
language models for others to use
2:58
this leads to the last point pre-trained
3:01
and fine-tuned meaning to pre-train a
3:04
large language model for a general
3:06
purpose with a large data set and then
3:08
fine tune it for specific aims with a
3:10
much smaller data set
3:13
the benefits of using large language
3:15
models are straightforward first a
3:18
single model can be used for different
3:19
tasks
3:20
this is a dream come true these large
3:23
language models that are trained with
3:25
petabytes of data and generate billions
3:27
of parameters are smart enough to solve
3:30
different tasks including language
3:32
translation sentence completion text
3:35
classification question answering and
3:37
more
3:38
second large language models require
3:41
minimal field training data when you
3:43
tailor them to solve your specific
3:45
problem
3:46
large language models obtain decent
3:48
performance even with little domain
3:50
training data in other words they can be
3:52
used for fuse shot or even zero shot
3:54
scenarios in machine learning few shot
3:57
refers to training a model with minimal
3:59
data and zero shot implies that a model
4:01
can recognize things that have not
4:03
explicitly been taught in the training
4:05
before
4:07
third the performance of large language
4:09
models is continuously growing when you
4:12
add more data and parameters
4:15
let's take Palm as an example in April
4:18
2022 Google released Palm short for
4:21
Pathways language model a 540 billion
4:25
parameter model that achieves a
4:27
state-of-the-art performance across
4:29
multiple language tasks
4:31
Palm is a dense decoder-only Transformer
4:34
model it has 540 billion parameters it
4:38
leverages the new Pathways system which
4:40
has enabled Google to efficiently train
4:43
a single model across multiple TPU V4
4:46
pods pathway is a new AI architecture
4:50
that will handle many tasks at once
4:52
learn new tasks quickly and reflect a
4:55
better understanding of the world
4:57
the system enables Palm to orchestrate
5:00
distributed computation for accelerators
5:04
we previously mentioned that Palm is a
5:06
Transformer model a Transformer model
5:08
consists of encoder and decoder the
5:11
encoder encodes the input sequence and
5:13
passes it to the decoder which learns
5:15
how to decode the representations for a
5:18
relevant task
5:21
we've come a long way from traditional
5:23
programming to neural networks to
5:25
generative models
5:27
in traditional programming we used to
5:29
have to hard code the rules for
5:30
distinguishing a cat type animal legs 4
5:34
ears 2 fur yes likes yarn catnip
5:40
in the wave of neural networks we could
5:43
give the network pictures of cats and
5:44
dogs and ask is this a cat and it would
5:47
predict a cat
5:49
in the generative wave we as users can
5:52
generate our own content whether it be
5:54
text images audio video or other
5:57
for example models like Palm or Lambda
6:00
or language model for dialogue
6:02
applications in just very very large
6:05
data from multiple sources across the
6:07
internet and build Foundation language
6:09
models we can use simply by asking a
6:12
question whether typing it into a prompt
6:14
or verbally talking into the prompt
6:17
so when you ask it what's a cat it can
6:19
give you everything it has learned about
6:21
a cat
6:23
let's compare llm development using
6:25
pre-trained models with traditional ml
6:27
development
6:29
first with llm development you don't
6:31
need to be an expert you don't need
6:33
training examples and there is no need
6:35
to train a model
6:37
all you need to do is think about prompt
6:39
design which is the process of creating
6:41
a prompt that is clear concise and
6:43
informative it is an important part of
6:46
natural language processing
6:48
in traditional machine learning you need
6:50
training examples to train a model you
6:53
also need compute time and Hardware
6:55
let's take a look at an example of a
6:58
text generation use case
7:01
question answering or QA is a subfield
7:04
of natural language processing that
7:07
deals with the task of automatically
7:09
answering questions posed in natural
7:11
language
7:12
QA systems are typically trained on a
7:15
large amount of text and code and they
7:17
are able to answer a wide range of
7:19
questions including factual definitional
7:21
and opinion based questions the key here
7:24
is that you need domain knowledge to
7:26
develop these question answering models
7:29
for example domain knowledge is required
7:32
to develop a question answering model
7:33
for customer i t support or Healthcare
7:36
or supply chain
7:38
using generative QA the model generates
7:40
free text directly based on the context
7:43
there is no need for domain knowledge
7:47
let's look at three questions given to
7:49
Bard a large language model chat bot
7:51
developed by Google AI
7:55
question one
7:56
this year's sales are one hundred
7:58
thousand dollars expenses are sixty
8:00
thousand dollars how much is net profit
8:03
Bard first shares how net profit is
8:06
calculated then performs the calculation
8:08
then Bard provides the definition of net
8:12
profit
8:13
here's another question inventory on
8:16
hand is six thousand units a new order
8:19
requires eight thousand units how many
8:22
units do I need to fill to complete the
8:24
order
8:25
again Bard answers the question by
8:27
performing the calculation
8:29
and our last example we have 1000
8:32
sensors in 10 geographic regions how
8:35
many sensors do we have on average in
8:37
each region
8:38
Bard answers the question with an
8:40
example on how to solve the problem and
8:42
some additional context
8:44
in each of our questions a desired
8:46
response was obtained this is due to
8:49
prompt design
8:50
prompt design and prompt engineering are
8:52
two closely related Concepts in natural
8:55
language processing both involve the
8:57
process of creating a prompt that is
8:59
clear concise and informative however
9:01
there are some key differences between
9:03
the two
9:05
prompt design is the process of creating
9:07
a prompt that is tailored to the
9:09
specific task that this system is being
9:11
asked to perform
9:13
for example if the system is being asked
9:15
to translate a text from English to
9:17
French The Prompt should be written in
9:19
English and should specify that the
9:22
translation should be in French
9:24
prompt engineering is the process of
9:26
creating a prompt that is designed to
9:28
improve performance
9:30
this may involve using domain-specific
9:32
knowledge providing examples of the
9:34
desired output or using keywords that
9:36
are known to be effective for the
9:38
specific system
9:40
prompt design is a more General concept
9:42
while prompt engineering is a more
9:44
specialized concept prompt design is
9:46
essential while prompt engineering is
9:48
only necessary for systems that require
9:50
a high degree of accuracy or performance
9:54
there are three kinds of large language
9:56
models generic language models
9:58
instruction tuned and dialogue tuned
10:01
each needs prompting in a different way
10:04
generic language models predict the next
10:06
word based on the language in the
10:08
training data
10:09
this is an example of a generic language
10:12
model the next word is a token based on
10:15
the language in the training data
10:17
in this example the cat sat on the next
10:20
word should be the and you can see that
10:23
the is the most likely next word
10:26
think of this type as an autocomplete in
10:28
search
10:30
in instruction tuned the model is
10:33
trained to predict a response to the
10:35
instructions given in the input
10:37
for example summarize a text of X
10:40
generate a poem in the style of X give
10:43
me a list of keywords based on semantic
10:45
similarity for x
10:48
and in this example classify the text
10:51
into neutral negative or positive
10:54
in dialogue tuned the model is trained
10:57
to have a dialogue by the next response
11:00
dialogue tuned models are a special case
11:03
of instruction tuned where requests are
11:06
typically framed as questions to a chat
11:08
bot dialogue tuning is expected to be in
11:11
the context of a longer back and forth
11:13
conversation and typically works better
11:15
with natural question-like phrasings
11:19
Chain of Thought reasoning is the
11:21
observation that models are better at
11:22
getting the right answer when they first
11:24
output text that explains the reason for
11:26
the answer
11:28
let's look at the question Roger has
11:30
five tennis balls he buys two more cans
11:33
of tennis balls each can has three
11:35
tennis balls how many tennis balls does
11:37
he have now
11:39
this question is posed initially with no
11:41
response the model is less likely to get
11:44
the correct answer directly however by
11:47
the time the second question is asked
11:48
the output is more likely to end with
11:51
the correct answer
11:53
a model that can do everything has
11:55
practical limitations task specific
11:58
tuning can make llms more reliable
12:01
vertex AI provides task specific
12:04
Foundation models let's say you have a
12:06
use case where you need to gather
12:07
sentiments or how your customers are
12:09
feeling about your product or service
12:11
you can use the classification task
12:13
sentiment analysis task model
12:17
same provision tasks if you need to
12:19
perform occupancy analytics there is a
12:21
task specific model for your use case
12:24
tuning a model enables you to customize
12:27
the model response based on examples of
12:29
the task that you want the model to
12:31
perform
12:31
it is essentially the process of
12:33
adapting a model to a new domain or set
12:36
of custom use cases by training the
12:38
model on new data
12:40
for example we may collect training data
12:42
and tune the model specifically for the
12:45
legal or medical domain
12:47
you can also further tune the model by
12:49
fine tuning where you bring your own
12:52
data set and retrain the model by tuning
12:54
every weight in the llm this requires a
12:57
big training job and hosting your own
12:59
fine-tuned model
13:01
here's an example of a Medical
13:02
Foundation model trained on Healthcare
13:05
data the tasks include question
13:07
answering image analysis finding similar
13:10
patients and so forth
13:12
fine-tuning is expensive and not
13:15
realistic in many cases
13:17
so are there more efficient methods of
13:19
tuning
13:21
yes parameter efficient tuning methods
13:24
or petm are methods for tuning a large
13:27
language model on your own custom data
13:29
without duplicating the model
13:31
the base model itself is not altered
13:34
instead a small number of add-on layers
13:36
are tuned which can be swapped in and
13:38
out at inference time
13:41
generative AI Studio lets you quickly
13:43
explore and customize generative AI
13:46
models that you can leverage in your
13:48
applications on Google Cloud
13:50
generative AI Studio helps developers
13:53
create and deploy generative AI models
13:55
by providing a variety of tools and
13:57
resources that make it easy to get
13:59
started for example there's a library of
14:02
pre-trained models a tool for
14:04
fine-tuning models a tool for deploying
14:07
models to production and a community
14:09
forum for developers to share ideas and
14:11
collaborate
14:13
generative AI app builder lets you
14:15
create gen AI apps without having to
14:17
write any code
14:19
gen AI app builder has a drag and drop
14:22
interface that makes it easy to design
14:24
and build apps a visual editor that
14:27
makes it easy to create and edit app
14:29
content a built-in search engine that
14:31
allows users to search for information
14:33
within the app and a conversational AI
14:36
engine that allows users to interact
14:38
with the app using natural language
14:40
you can create your own chat Bots
14:42
digital assistants custom search engines
14:45
knowledge bases training applications
14:47
and more
14:49
Palm API lets you test and experiment
14:52
with Google's large language models and
14:55
gen AI tools
14:57
to make prototyping quick and more
14:59
accessible developers can integrate Palm
15:01
API with makersweet and use it to access
15:04
the API using a graphical user interface
15:07
the suite includes a number of different
15:10
tools such as a model training tool a
15:12
model deployment tool and a model
15:14
monitoring tool the model training tool
15:17
helps developers train ml models on
15:19
their data using different algorithms
15:21
the model deployment tool helps
15:23
developers deploy ml models to
15:25
production with a number of different
15:27
deployment options
15:29
and the model monitoring tool helps
15:31
developers monitor the performance of
15:34
their ml models in production using a
15:36
dashboard and a number of different
15:38
metrics
15:40
that's all for now thanks for watching
15:42
this course introduction to large
15:44
language models
生成式AI学习
生成式AI学习3——负责任的人工智能入门
宝玉的技术分享
4.55K subscribers
Subscribed
2
Share
Download
Clip
Save
416 views Jun 5, 2023 生成式AI学习
3. Introduction to Responsible AI 负责任的人工智能入门
这是Google的一个AI入门课程,解释了什么是负责任的人工智能,为什么它很重要,以及谷歌如何在其产品中实施负责任的人工智能,谷歌的7项人工智能原则。
https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
Transcript
0:00
hello and welcome to introduction to
0:02
responsible AI
0:05
this course will help you understand why
0:07
Google has put AI principles in place
0:10
identify the need for a responsible AI
0:13
practice within an organization
0:16
recognize that decisions made at all
0:18
stages of a project have an impact on
0:20
responsible AI
0:22
and recognize that organizations can
0:24
design AI to fit their own business
0:26
needs and values
0:29
many of us already have daily
0:31
interactions with artificial
0:32
intelligence or AI from predictions for
0:35
traffic and weather to recommendations
0:37
for TV shows you might like to watch
0:39
next
0:41
as AI becomes more common many
0:43
technologies that aren't AI enabled may
0:46
start to seem inadequate
0:48
now ai systems are enabling computers to
0:51
see understand and interact with the
0:54
world in ways that were unimaginable
0:56
just a decade ago
0:57
and these systems are developing at an
1:00
extraordinary pace
1:02
yet despite these remarkable
1:03
advancements AI is not infallible
1:08
developing responsible AI requires an
1:10
understanding of the possible issues
1:12
limitations or unintended consequences
1:17
technology is a reflection of what
1:19
exists in society so without good
1:21
practices AI May replicate existing
1:24
issues or bias and amplify them
1:28
but there isn't a universal definition
1:30
of responsible AI
1:32
nor is there a simple checklist or
1:34
formula that defines how responsible AI
1:37
practices should be implemented
1:40
instead organizations are developing
1:42
their own AI principles that reflect
1:44
their mission and values
1:46
while these principles are unique to
1:48
every organization if you look for
1:51
common themes you find a consistent set
1:53
of ideas across transparency fairness
1:56
accountability and privacy
2:00
at Google our approach to responsible AI
2:03
is rooted in a commitment to strive
2:05
towards AI That's built for everyone
2:07
that's accountable and safe
2:10
that respects privacy
2:11
and that is driven by scientific
2:13
excellence
2:15
we've developed our own AI principles
2:18
practices governance processes and tools
2:21
that together embody our values and
2:24
guide our approach to responsible AI
2:27
we've Incorporated responsibility by
2:30
Design into our products and even more
2:32
importantly our organization
2:35
like many companies we use our AI
2:37
principles as a framework to guide
2:39
responsible decision making
2:42
we all have a role to play in how
2:44
responsible AI is applied
2:46
whatever stage in the AI process you're
2:48
involved with from design to deployment
2:51
or application the decisions you make
2:53
have an impact
2:55
therefore it's important that you too
2:58
have a defined and repeatable process
3:00
for using AI responsibly
3:03
there's a common misconception with
3:04
artificial intelligence that machines
3:07
play the central decision-making role
3:09
in reality it's people who design and
3:12
build these machines and decide how
3:14
they're used
3:16
people are involved in each aspect of AI
3:18
development they collect or create the
3:21
data that the model is trained on
3:23
they control the deployments of the AI
3:25
and how it's applied in a given context
3:27
essentially human decisions are threaded
3:30
throughout our technology products and
3:33
every time a person makes a decision
3:34
they're actually making a choice based
3:37
on their own values
3:39
whether it's the decision to use
3:40
generative AI to solve a problem as
3:43
opposed to other methods or anywhere
3:45
throughout the machine learning life
3:46
cycle that person introduces their own
3:49
set of values
3:51
this means that every decision Point
3:52
requires consideration and evaluation to
3:55
ensure that choices have been made
3:57
responsibly from concept through
3:59
deployment and maintenance
4:01
because there's potential to impact many
4:03
areas of society not to mention people's
4:06
daily lives it's important to develop
4:08
these Technologies with ethics in mind
4:12
responsible AI doesn't mean to focus
4:14
only on the obviously controversial use
4:17
cases without responsible AI practices
4:19
even seemingly innocuous AI use cases or
4:23
those with good intent could still cause
4:25
ethical issues or unintended outcomes or
4:28
not be as beneficial as they could be
4:31
ethics and responsibility are important
4:33
not least because they represent the
4:35
right thing to do but also because they
4:37
can guide AI design to be more
4:39
beneficial for people's lives
4:42
at Google we've learned that building
4:44
responsibility into any AI deployment
4:47
makes better models and builds trust
4:49
with our customers and our customers
4:51
customers
4:53
if at any point that trust is broken we
4:56
run the risk of AI deployments being
4:58
stalled unsuccessful or at worst harmful
5:01
to stakeholders those products affect
5:05
this all fits into our belief at Google
5:07
that responsible AI equals successful AI
5:12
we make our products and business
5:13
decisions around AI through a series of
5:16
Assessments and reviews these instill
5:18
rigor and consistency in our approach
5:20
across product areas and geographies
5:23
these assessments and reviews begin with
5:25
ensuring that any project aligns with
5:27
our AI principles
5:30
while AI principles help ground a group
5:32
in shared commitments not everyone will
5:35
agree with every decision made about how
5:37
products should be designed responsibly
5:40
this is why it's important to develop
5:42
robust processes that people can trust
5:44
so even if they don't agree with the end
5:46
decision they trust the process that
5:49
drove the decision
5:51
in June 2018 we announced seven AI
5:54
principles to guide our work
5:57
these are concrete standards that
5:59
actively govern our research and product
6:01
development and affect our business
6:03
decisions
6:05
here's an overview of each one
6:08
AI should be socially beneficial
6:11
any project should take into account a
6:14
broad range of Social and economic
6:15
factors and will proceed only where we
6:18
believe that the overall likely benefits
6:20
substantially exceed the foreseeable
6:22
risks and downsides
6:25
two
6:26
AI should avoid creating or reinforcing
6:28
unfair bias
6:30
we seek to avoid unjust effects on
6:32
people
6:33
particularly those related to sensitive
6:35
characteristics such as race ethnicity
6:38
gender nationality income sexual
6:43
orientation ability and political or
6:46
religious belief
6:48
three AI should be built and tested for
6:51
safety
6:52
we will continue to develop and apply
6:54
strong Safety and Security practices to
6:56
avoid unintended results that create
6:58
risks of harm
7:01
4. AI should be accountable to people
7:05
we will Design AI systems that provide
7:07
appropriate opportunities for feedback
7:09
relevant explanations and appeal
7:12
five
7:14
AI should incorporate privacy design
7:16
principles
7:17
we will give opportunity for notice and
7:19
consent encourage architectures with
7:22
privacy safeguards and provide
7:24
appropriate transparency and control
7:25
over the use of data
7:28
6. AI should uphold high standards of
7:31
scientific excellence
7:33
we all work with a range of stakeholders
7:35
to promote thoughtful leadership in this
7:37
area drawing on scientifically rigorous
7:39
and multi-disciplinary approaches
7:41
and we will responsibly share our
7:43
knowledge by publishing educational
7:45
materials best practices and research
7:48
this enable more people to develop
7:50
useful AI applications
7:53
7. AI should be made available for uses
7:56
that Accord with these principles
7:58
many technologies have multiple uses so
8:01
will work to limit potentially harmful
8:03
or abusive applications
8:06
in addition to these seven principles
8:08
there are certain AI applications that
8:10
we will not pursue
8:12
we will not design or deploy AI in these
8:15
four application areas
8:17
Technologies the cause or are likely to
8:19
cause overall harm
8:21
weapons or other Technologies whose
8:24
principal purpose or implementation is
8:26
to cause or directly facilitate injury
8:28
to people
8:29
technologies that gather or use
8:31
information for surveillance that
8:32
violates internationally accepted norms
8:35
and Technologies whose purpose
8:37
contravenes widely accepted principles
8:39
of international law and human rights
8:42
establishing principles was a starting
8:44
point rather than an end
8:47
what remains true is that our AI
8:49
principles rarely give us direct answers
8:51
to our questions on how to build our
8:53
products they don't and shouldn't allow
8:56
us to sidestep hard conversations
8:59
they are a foundation that establishes
9:01
what we stand for what we build and why
9:04
we build it and they are core to the
9:07
success of our Enterprise AI offerings
宝玉的技术分享
4.55K subscribers
生成式AI学习4——图像生成简介
宝玉的技术分享
4.55K subscribers
Subscribed
3
Share
Download
Clip
Save
380 views Jun 15, 2023 生成式AI学习
4. Introduction to Image Generation 图像生成简介
这是Google的一个AI入门课程,介绍扩散模型,这是一类在图像生成领域最近显示出潜力的机器学习模型。扩散模型的灵感来源于物理学,特别是热力学。
在过去的几年中,扩散模型在研究和工业中都变得很受欢迎。扩散模型是Google Cloud上许多最先进的图像生成模型和工具的基础。本课程将向你介绍扩散模型背后的理论,以及如何在Vertex AI上训练和部署它们。
课程地址:https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Transcript
Introduction
0:00
hi my name is Kyle Steckler and I'm a
0:02
machine learning engineer on the
0:03
advanced Solutions lab team at Google
0:05
cloud in this talk we're going to dive
0:08
into an introduction to image generation
0:10
specifically I'll provide an
0:12
introduction to diffusion models a
0:14
family of models that have recently
0:16
shown tremendous promise in the image
0:18
generation space
0:19
with that said image generation has long
0:22
been a field of interest and there are
0:24
many interesting approaches that you may
0:25
have heard about
0:27
now while many approaches have been
0:29
implemented for image generation some of
0:31
the more promising ones over time have
0:33
been model families such as variational
0:35
autoencoders which encode images to a
0:38
compressed size and then decode back to
0:41
the original size while learning the
0:43
distribution of the data itself
0:45
generative adversarial models or Gans
0:48
have also been quite popular these
0:50
models are really interesting they
0:52
actually pit two neural networks against
0:54
each other one neural network the
0:56
generator creates images and the other
0:59
neural network the discriminator
1:00
predicts if the image is real or fake
1:03
over time the discriminator gets better
1:05
and better at distinguishing between
1:07
real and fake and the generator gets
1:09
better and better at creating real
1:11
looking fakes you may have heard the
1:13
term deep fakes before
1:15
uh and lastly Auto regressive models
1:17
these things generate images by treating
1:19
an image as a sequence of pixels and the
1:22
modern approach with auto regressive
1:24
models actually draws much of its
1:26
inspiration from how llms or large
1:28
language models handle text very
1:31
interesting
Diffusion Models
1:32
now in this talk this is really going to
1:34
be the focus and this is one of the
1:36
newer image generation model families
1:38
and that is diffusion models diffusion
1:41
models draw their inspiration from
1:43
physics specifically thermodynamics and
1:47
while they were first really introduced
1:48
for image generation in 2015 it took a
1:52
few years for the idea to really take
1:54
off
1:55
within the last few years though 2020 up
1:58
until now we have seen a massive
2:01
increase of diffusion models in both the
2:03
research space and now today in the
2:07
industry space as well diffusion models
2:10
underpin many of the state-of-the-art
2:12
image generation systems that you may be
2:14
familiar with today
Use Cases
2:16
diffusion models show promise across a
2:18
number of different use cases
2:20
unconditioned diffusion models where
2:22
models have no additional input or
2:25
instruction can be trained from images
2:27
of a specific thing such as faces as you
2:30
can see on the slide here and it will
2:32
learn to generate new images of that
2:34
thing
2:35
another example of unconditioned
2:37
generation is super resolution which is
2:40
really powerful in enhancing low quality
2:42
images
2:43
we also have conditioned generation
2:46
models and these give us things like
2:48
text to image where we can generate an
2:50
image from a text prompt and other
2:52
things like image in painting and text
2:54
guided image to image where we can
2:57
remove or add things we can edit the
3:00
image itself
3:01
now let's take a little bit of a deeper
3:04
dive into diffusion models and talk
3:07
about how do these things actually work
Essential Idea
3:10
it's noted on the slide here the
3:12
essential idea is to systematically and
3:16
slowly destroy structure in a data
3:19
distribution through an iterative
3:21
forward diffusion process
3:23
really this is going to be adding noise
3:26
iteratively to an image
3:29
we then learn a reverse diffusion
3:32
process that restores structure in the
3:34
data yielding a highly flexible and
3:37
tractable generative model of the data
3:40
in other words we can add noise to an
3:43
image iteratively and we can then train
3:46
a model that learns how to denoise an
3:50
image thus generating novel images
Pure Noise
3:53
so the goal here is that we want to have
3:57
this model learn to denoise to remove
4:00
noise
4:01
and in that aspect then we could start
4:04
here on on the left of the slide we
4:06
could start from Pure Noise
4:09
and from that Pure Noise we could have a
4:12
model that will be able to synthesize a
4:14
novel image
4:15
now I know that there's a bit of math
4:17
notation on this slide so let's break it
4:20
down just a little bit
4:21
we start with a large data set of images
4:24
but let's just take a single image here
4:27
shown on the right hand side well we can
4:30
start this forward diffusion process and
4:33
we can go from x0 the initial image to
4:37
X1 the initial image with a little bit
4:40
of noise added to it and we can do this
4:43
over and over again iteratively adding
4:46
more and more noise to the initial image
4:49
now this distribution we call Q and it
4:53
only depends on the previous step so if
4:55
we do this over and over iteratively
4:58
adding more noise we need to think about
5:00
how many times do we perform that
5:03
operation and the initial research paper
5:06
did this 1000 times
5:08
so ideally with that number being high
5:11
enough 1000 by the end of it we should
5:14
reach a state of Pure Noise
5:16
and so by this point all structure in
5:20
the initial image is completely gone
5:22
we're just looking at Pure Noise now
5:25
obviously that's kind of the easy part
5:27
it's not too difficult to perform Q to
5:31
iteratively add more and more noise the
5:33
challenging part is how do we go from a
5:36
noisy image to a slightly less noisy
5:39
image
Reverse Diffusion Process
5:40
and so this will refer to as the reverse
5:43
diffusion process and at this stage
5:46
every step of the way every step that we
5:49
add noise we also learn the reverse
5:52
diffusion process
5:54
that is we train a machine learning
5:56
model that takes in as input the noisy
5:59
image and predicts the noise that's been
6:02
added to it
6:04
now let's look at that from a slightly
6:07
different angle
Demonstration
6:08
we can visualize a single training step
6:11
of the model here
6:12
so we have our initial image X on the
6:15
left and we sample at a time step to
6:18
create a noisy image
6:20
we then send that through our denoising
6:23
model
6:24
with the goal of predicting the noise so
6:26
the output of the model is the predicted
6:28
noise
6:30
but we just added the noise to this
6:32
image we know what it is so we can
6:34
actually compare that we can see what is
6:36
the difference between the model's
6:38
predicted noise and the actual noise
6:40
that we added now this model is trained
6:44
similar to most machine learning models
6:47
that you might be familiar with to
6:49
minimize that difference and over time
6:51
after seeing enough examples this model
6:54
gets very very good at removing noise
6:57
from images
Generating Images
6:59
and now for the fun part this is where
7:01
it gets really cool is we need to think
7:03
about once we train this model how do we
7:06
generate images with it well it's
7:09
actually fairly intuitive we can just
7:11
start with pure absolute noise and send
7:15
that noise through our model that is
7:17
trained
7:18
we then take the output the predicted
7:21
noise and subtract it from the initial
7:24
noise and if we do that over and over
7:26
and over again we end up with a
7:29
generated image
7:31
another way to think about this is that
7:33
the model is able to learn the real data
7:36
distribution of images that it's seen
7:38
and then sample from that learn
7:41
distribution to create new novel images
7:44
very cool
Conclusion
7:46
as I'm sure we're all aware there have
7:48
been many advances in this space in just
7:51
the last few years
7:53
and while many of the exciting new
7:54
technologies on vertex AI for image
7:56
generation are underpinned with
7:59
diffusion models lots of work has been
8:01
done to generate images faster and with
8:04
more control
8:05
hopefully now after taking a little bit
8:07
of a look under the covers into how
8:09
diffusion models work you have a bit
8:11
better intuition as to what's actually
8:13
going on with these really new
8:15
Innovative model types
8:18
we've also seen wonderful results
8:20
combining the power of diffusion models
8:22
with the power of llms or large language
8:25
models for that can really enable us to
8:28
create context aware photorealistic
8:31
images from a text prompt one great
8:35
example of this is Imogen from Google
8:37
research while it's a bit more
8:39
complicated than what we've talked
8:41
through in this session you can see that
8:43
at its core it's a composition of an llm
8:46
and a few diffusion based models this is
8:50
a really exciting space and I'm thrilled
8:52
to see this wonderful technology make
8:54
its way into Enterprise grade products
8:56
on vertex AI thanks for listening
生成式AI学习5——编码器-解码器架构(上)概述
宝玉的技术分享
4.55K subscribers
Subscribed
7
Share
Download
Clip
Save
403 views Jun 18, 2023 生成式AI学习
Transcript
0:00
hello everybody my name is benue Deja
0:02
I'm a machine learning engineer at
0:04
Google's Advanced Solutions lab if you
0:07
want to know more about the advanced
0:08
Solutions lab please follow the link in
0:10
the description box below there is a lot
0:13
of excitement currently around
0:14
generative AI a new advancements
0:17
including new vertex AI features such as
0:20
Chennai Studio model Garden genei API
0:23
our objective in these short course is
0:26
to give you a solid footing on some of
0:28
the underlying Concepts that make all
0:30
the Gen AI a magic possible today I'm
0:33
going to talk about the encoder decoder
0:36
architecture which is at the core of
0:37
large language models we will start with
0:40
a brief overview of the architecture
0:42
then I'll go over how we train these
0:45
models and at last we will see how to
0:47
produce text from a trained model at
0:50
serving time
0:51
to begin with the encoder decoder
0:53
architectures and sequence of sequence
0:55
architecture this means it takes for
0:58
example a sequence of words as input
1:01
like the sentence in English the cat ate
1:04
the mouse and it outputs cell to the
1:07
translation in French
1:10
the encoder decoder architecture is a
1:13
machine that consumes the sequences and
1:15
puts out sequences
1:17
another input example is the sequence of
1:20
words forming the prompt sent to a large
1:22
language model then the output is the
1:25
response of the large language model to
1:27
this prompt
1:29
now we know that an encoder decoder
1:31
architecture does but how does it do it
1:35
typically the encoder decoder
1:37
architecture has two stages
1:39
first an encoder stage that produces a
1:42
vector representation of the input
1:44
sentence
1:45
then this angular stage is followed by a
1:48
decoder stage that creates the sequence
1:51
output
1:52
both the encoder and the decoder can be
1:54
implemented with different internal
1:56
architectures the internal mechanism can
1:58
be a regular neural network as shown in
2:01
this slide or a more complex Transformer
2:03
block as in the case of the super
2:05
powerful language models we see nowadays
2:08
a regular neural network encoder takes
2:10
each token in the input sequence one at
2:13
a time and produces a state representing
2:16
this token as well as all the previously
2:19
ingested tokens
2:21
then this state is used in the next
2:24
encoding step as input along with the
2:27
next token to produce the next state
2:30
once you are done ingesting all the
2:33
input tokens into the RNN you output a
2:36
vector that essentially represents the
2:39
full input sentence
2:40
that's it for the encoder what about the
2:42
decoder path
2:44
the decoder tags the vector
2:46
representation of the input sentence and
2:50
produces an output sentence from that
2:52
representation
2:54
in the case of a RNN decoder it does it
2:57
in steps decoding the output one docket
3:00
at a time using the current state on
3:03
what has been decoded so far okay now
3:06
that we have a high level understanding
3:09
of the encoder decoder architecture how
3:11
do we train it
3:13
that's the training phase
3:15
to train a model you need a data set
3:18
that is a collection of input output
3:20
pairs that you want your model to
3:22
imitate
3:23
you can then feed the data set to the
3:26
model which will correct its own weights
3:29
during training on the basis of the
3:32
error it produces on a given input in
3:36
the data set
3:37
this error is essentially the difference
3:39
between what the neural networks
3:41
generates given an input sequence and
3:44
the true output sequence you have in the
3:46
data set
3:48
okay but then how do you produce this
3:51
data set in the case of the encoder
3:53
decoder architecture this is a bit more
3:55
complicated than for typical predictive
3:58
models first you need a collection of
4:01
input and output text in the case of
4:04
translation that would be sentence pairs
4:06
where one sentence is in the source
4:09
language while the order is the
4:11
translation
4:12
you'll Feed The Source language sentence
4:14
to the encoder and then compute the
4:17
error between web decoder generates and
4:20
the actual translation
4:22
however there is a catch the decoder
4:24
also needs its own input at training
4:27
time
4:28
you need to give the decoder the correct
4:30
previous translated token as input to
4:33
generate the next token rather than what
4:36
the decoder has generated so far
4:39
this method of training is called
4:42
teacher forcing because you force the
4:45
decoder to generate the next token from
4:48
the correct previous token
4:50
this means that in your code you'll have
4:53
to prepare two input sentences the
4:56
original one fed to the encoder and also
4:59
the original one shifted to the left
5:02
that will feed to the decoder
5:05
another substitute point is that the
5:06
decoder generates at each step only the
5:09
probability that each token in your
5:11
vocabulary is the next one
5:14
using these probabilities you'll have to
5:16
select a word and there are several
5:18
approaches for that the simplest one
5:20
called greedy search is to generate the
5:23
token that has the highest probability
5:26
a better approach that produces better
5:28
results is called beam search in that
5:31
case you use the probabilities generated
5:34
by the decoder to evaluate the
5:37
probability of sentence chunks rather
5:40
than individual words and you keep at
5:42
each step the most likely generated
5:45
chunk
5:46
that's how training is done now let's
5:49
move on to serving
5:51
after training at serving time when you
5:54
want to say generate a new translation
5:58
or a new response to a prompt you'll
6:01
start by feeding the encoder
6:03
representation of the prompt to the
6:05
decoder along with a special token like
6:09
go
6:10
this will prompt the decoder to generate
6:13
the first word let's see in more details
6:15
what happens during the generation stage
6:18
first of all the start token needs to be
6:21
represented by a vector using an
6:23
embedding layer
6:24
then the recurrent layer will update the
6:28
previous state produced by the encoder
6:30
into a new state
6:33
this state will be passed to a done
6:35
softmax layer to produce the world
6:37
probabilities
6:39
finally the word is generating by taking
6:42
the highest probability word with
6:44
research or the highest probability
6:46
chunk will be in search at this point
6:48
you repeat this procedure for the second
6:50
world to be generated
6:52
and for the third one
6:54
until you are done
6:56
so what's next
6:58
well the difference between the
7:00
architecture we just learned about and
7:02
the ones in the largely win models is
7:05
what goes inside the encoder and decoder
7:08
blocks the simple RNN network is
7:12
replaced by Transformer blocks which is
7:14
an architecture discovered here at
7:16
Google and which is based on the
7:19
attention mechanism
7:20
if you are interested in knowing more
7:22
about these topics we have two more
7:24
overview courses in that series
7:26
attention mechanism overview
7:29
and Transformer models and bird overview
7:32
also if you liked this course today have
7:35
a look at the encoder decoder
7:37
Architecture Lab walkthrough where I'll
7:39
show you how to generate poetry in code
7:41
using the concepts that we have seen in
7:44
this overview
7:45
thanks for your time have a great day
编码器-解码器架构
这门课程为你提供了编码器-解码器架构的概述,这是一种强大且普遍存在的机器学习架构,适用于如机器翻译、文本摘要和问题回答等序列到序列任务。你将学习到编码器-解码器架构的主要组件以及如何训练和使用这些模型。在相应的实验室演示中,你将使用TensorFlow从头开始编写编码器-解码器架构的简单实现,用于诗歌生成。
完成这门课程后,你可以获得上面显示的徽章!你可以通过访问个人资料页查看已经获得的所有徽章。通过向世界展示你已经开发的技能,提升你的云计算职业生涯!
编码器-解码器架构:概述
这个模块为你提供了编码器-解码器架构的概述,这是一种强大且普遍存在的机器学习架构,适用于如机器翻译、文本摘要和问题回答等序列到序列任务。你将学习到编码器-解码器架构的主要组件以及如何训练和使用这些模型。
课程地址:https://www.cloudskillsboost.google/c...
生成式AI学习6——编码器-解码器架构(下)Lab演练
宝玉的技术分享
4.55K subscribers
Subscribed
5
Share
Download
Clip
Save
221 views Jun 21, 2023 生成式AI学习
编码器-解码器架构
这门课程为你提供了编码器-解码器架构的概述,这是一种强大且普遍存在的机器学习架构,适用于如机器翻译、文本摘要和问题回答等序列到序列任务。你将学习到编码器-解码器架构的主要组件以及如何训练和使用这些模型。在相应的实验室演示中,你将使用TensorFlow从头开始编写编码器-解码器架构的简单实现,用于诗歌生成。
完成这门课程后,你可以获得上面显示的徽章!你可以通过访问个人资料页查看已经获得的所有徽章。通过向世界展示你已经开发的技能,提升你的云计算职业生涯!
编码器-解码器架构:Lab演练
你将在TensorFlow中编码一个简单的编码器-解码器架构的实现,用于从头开始生成诗歌。
课程地址:https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
0:00
hello everybody my name is I'm a machine
0:03
learning engineer at Google's Advanced
0:05
solution lab if you want to know more
0:06
about what the advanced Solutions lab is
0:09
please follow the link below in the
0:11
description box
0:13
there is lots of excitement currently
0:14
around generative AI a new advancements
0:17
including new vertex AI features such as
0:20
Chennai Studio model Garden genei API
0:24
our objective in this short session is
0:26
to give you a solid footing on some of
0:28
the underlying Concepts that make all
0:30
the Gen AI magic possible today I'll go
0:33
over the code that's complementary to
0:35
the encoder decoder architecture of a
0:37
view course in the same series we will
0:40
see together how to build a poetry
0:42
generator from scratch using the encoder
0:44
decoder architecture
0:46
you'll find the setup instructions in
0:48
our GitHub repository okay let's now
0:51
have a look at the code to access our
0:54
lab
0:55
go in the ASL mle Merchant folder
0:58
then the notebooks folder
1:02
then the text models folder and in the
1:05
solutions folder you'll find the text
1:09
generation notebook that's the lab that
1:12
we go over to the
1:17
in this lab we will Implement a
1:19
character-based text generator based on
1:22
the encoder decoder architecture
1:24
character based means that the tokens
1:27
consumed and generated by the network
1:30
are characters and not words
1:33
we will use
1:35
place as a data set
1:38
they have a special structure which are
1:40
that of people talking with each other
1:45
and here you see an example of
1:48
a piece of text that has been generated
1:50
by the the train neural network
1:53
well the sentences are not necessarily
1:55
making sounds nor are grammatically
1:57
correct this is remarkable in many ways
2:01
first of all remember it's character
2:03
based so it means that it learns to
2:05
predict only the most probable
2:07
characters despite that it was able to
2:09
learn pretty well the notion of words
2:12
separated by blank spaces and also the
2:15
basic structure of a play with the
2:18
characters talking to each other
2:21
is a very small Network as you will see
2:24
it's based on the RNN architecture and
2:27
only trained for 30 epochs in the vertex
2:30
AI workbench which is a pretty fast
2:32
training time
2:33
so let's look at the code now
2:42
so the first thing is to import the the
2:46
libraries that we need in particular we
2:49
will code our encoder decoder
2:51
architecture using tensorflow Keras to
2:54
import that
2:57
we download our data set using TF Keras
3:01
UTS get file
3:02
so now the data set is on disk and we
3:05
just need to load it
3:08
into a variable called text so the text
3:12
variable now contains the whole string
3:14
representing all the all the plays in
3:17
that Shakespeare data set
3:19
we can have a quick look at
3:22
what it is and you see if we print the
3:25
first 250 characters you have the first
3:28
citizen speaking to
3:31
everybody and everybody else speaking
3:34
to the first citizen
3:38
this cell computes the number of unique
3:41
characters that we have in that
3:44
in the text data set and we see that we
3:47
have
3:48
65 unique characters right these
3:52
characters will be the tokens that the
3:55
neural network will consume
3:57
during training and will generating
3:59
during a serving
4:02
so the first step here now is to
4:05
vectorize the text what do we mean by
4:08
that it means that first of all
4:13
we will need to extract from the actual
4:18
string a sequence of characters which we
4:21
can do with tensorflow by using TF
4:24
strings Unicode split
4:26
so now
4:28
for example text
4:30
here are transformed into a list of
4:35
sequences of characters
4:39
a neural network cannot consume
4:41
immediately the characters we need to
4:43
transform that into numbers so we need
4:46
to Simply map each of the character to a
4:49
given
4:50
int
4:51
for that we have the TF Keras layer
4:54
string lookup
4:57
to which you just need to pass the
5:01
list of your vocabulary
5:04
the 65 unique characters that we have in
5:07
our copies
5:09
and that will produce a layer that
5:11
when passed the characters will produce
5:16
corresponding hints So within that that
5:20
layer you have a mid mapping that has
5:22
been generated between the characters
5:25
and the int
5:27
to get the inverse mapping
5:30
you use the same layer string lookup
5:34
with the exact same vocabulary that you
5:37
retrieve from the first layer by using
5:39
get vocabulary
5:41
but you set that parameter to be true
5:44
invert equal true and that it will
5:46
compute the invert mapping which is the
5:49
mapping from int to cars
5:54
right and indeed if you pass to this
5:56
mapping
5:57
a sequence of IDs the IDS it gives you
6:03
back
6:04
the corresponding characters using the
6:08
mapping that's stored in the memory of
6:10
this layer
6:11
so that's that
6:15
okay now let's let's build a data set
6:17
that we will train our neural network
6:19
with for that
6:22
we are using the
6:24
TF data data set API
6:28
which has this nice method from tons of
6:31
slices
6:33
which will convert that tensor of ins
6:36
that represents our whole copies of text
6:39
of place as int it will store that into
6:42
a TF data data set
6:44
so at this point the elements of these
6:48
data sets are just the individual
6:51
characters
6:52
so that's not great for us but we want
6:54
to feed our neural network with our
6:57
sequences of the same length but not
7:00
just one character we need to predict
7:02
the next character so
7:05
but luckily the data set API has this
7:08
nice function batch that will do exactly
7:10
that for us so if we pass
7:13
if we invoke the batch method on our ID
7:16
data set to which we pass a given
7:20
sequence length which we set to be a 100
7:23
here
7:24
now the elements the data points that
7:29
are stored in our data set are no longer
7:31
characters but sequences of
7:35
100 characters so here you see
7:39
an example if we take just one element
7:42
there are no longer characters but
7:43
sequences of 100 of their
7:46
character IDs if you want not not
7:48
characters
7:49
but character IDs
7:52
okay
7:54
it's not completely um we are not
7:57
completely done here we still need to
7:59
create
8:02
the input sequences that we were going
8:04
to pass to the decoder and also the
8:06
sequences that we want to predict right
8:09
and what are these sequences they are
8:11
just the sequences of the next character
8:13
in the input sequence
8:15
so for instance here if I have the
8:18
sequence tensorflow
8:20
the sequence tensorflow at the beginning
8:24
then um
8:26
the input sequence we can do from it is
8:29
tan sir flow without the W and the
8:33
target sequence that we want to predict
8:36
is the same sequence but just shifted by
8:38
one on the right so
8:41
answer flow and you see that e is the
8:45
next character for ring t
8:48
n is the next character for an e Etc so
8:53
basically this little function does
8:54
exactly that it takes an original
8:56
sequence creates
8:59
an input sequence from that by just
9:02
truncating that sequence removing the
9:04
last character and that just the target
9:07
sequence is created by starting at
9:11
starting at the first character
9:15
so how we do that we just map
9:19
the split input Target function to our
9:22
sequence data set
9:24
okay
9:26
that's already that's it now let's see
9:28
how to build the model
9:30
first off we set the number of variables
9:35
the vocabular size
9:38
the size of the vectors we want to
9:41
represent our characters with right that
9:44
would be 256 and the number of neurons
9:47
or a recurrent layer will have
9:51
for the model itself it's really really
9:54
simple model
9:55
we create it by using the Keras
9:59
subclassing API
10:01
we create just a new class called my
10:03
model and we subclass here
10:07
from TF Keras model
10:09
when you do that you only have to
10:13
overwrite two functions the Constructor
10:16
and the call function so let's see what
10:18
each of these function does
10:20
the first function takes essentially the
10:24
hyper parameters of your modality
10:26
vocabulary size the embedding
10:28
dimensional number of neural net the
10:29
number of neurons
10:31
for your recurrent layer and it just
10:35
constricts the layers you will need and
10:38
store them as
10:42
as variables of the class
10:45
okay
10:47
now really
10:49
how these layers are connected
10:51
all that is specified in the call
10:54
function right the architecture of your
10:56
network if you if you want
11:00
and let's see what everybody does here
11:02
you take the input which are sequences
11:04
of inst representing the characters
11:08
we have a first layer
11:10
that will
11:12
create for each of the
11:14
ins a vector representing that so that's
11:17
a trainable layer so as the training
11:20
progresses these vectors representing
11:23
the
11:24
characters will start to be a more and
11:26
more meaningful at least that's the idea
11:29
then this static representations of the
11:33
characters is passed to the recurrent
11:35
layer that will somehow
11:37
modify this representation according to
11:41
the context of what I've seen what have
11:44
what has been seen previously and
11:47
generate a state of what I've seen
11:50
previously that will be reusing The Next
11:52
Step
11:52
finally we pass the output of the
11:57
regular layer to a dance layer that will
11:59
I would put as many numbers as we as we
12:02
have in our vocabulary which means one
12:06
score for each of the possible 65
12:11
characters and this score
12:13
represent the probability of the
12:16
character being the next one
12:20
so that's all that model does then we
12:24
instantiate it
12:29
once we have done that we can look at
12:32
the structure of the model using model
12:33
summary and you see here you have the
12:36
embedding layer the recurrent layer and
12:39
the dance layer that we just
12:40
encoded
12:42
implemented in our in our model
12:47
that's that so let's train the model
12:49
before we train the model we need a loss
12:52
and that's the loss function that will
12:54
compare the output of the model with the
12:57
truth right since that's essentially a
13:01
classification problem with many classes
13:04
the classes being
13:06
each of the possible characters to be
13:09
the next
13:10
the loss will be the sparse categorical
13:13
cross entropylos
13:14
and also because the neural network
13:19
output the logits and not directly the
13:21
probability we configure this loss to
13:25
be computed not from the probability
13:27
scores but from the Logics course
13:31
okay
13:32
once we have the laws we can compile our
13:36
model which means that basically we tied
13:39
to it a loss and also an Optimizer that
13:42
will update the weights during training
13:44
to decrease the loss as much as possible
13:48
basically it
13:50
then here we have a little bit of a
13:52
callback that we will use and that will
13:55
save the weights during training which
13:58
is a useful
14:01
right and we are all set up now to to
14:04
start the training so we do a model.fit
14:07
on the data set
14:09
we choose a number of epochs we want to
14:13
be trained on an Epoch is a full pass on
14:18
the data set so here we will have a look
14:20
at 10 10 times the Corpus of place we
14:24
have
14:25
in our text vector
14:27
and we give the Callback to make sure
14:30
that the weights are saved as saved
14:34
during the the training
14:36
that's it so that that's relatively
14:38
simple we train the model we have a
14:40
train model now what do we do with it
14:42
and that's a bit of a complication in
14:44
the
14:45
encoder decoder architecture is that you
14:48
cannot really immediately use your model
14:50
you need to write a sort of a decoding
14:53
function
14:54
that's here right that will decode the
14:57
generated text
14:59
a step at a time using the train model
15:03
okay so here in this case we chose to
15:07
um
15:07
implement this decoding function
15:10
um as a Keras model so we subclass from
15:14
the TF Keras model
15:16
domain
15:18
method in that model is to generate One
15:21
Step the have a quick look
15:24
to what it does so it takes the inputs
15:27
so the input can be to prompt the
15:30
initial prompt the initial sequence of
15:32
character you want the
15:34
encoder decoder model to
15:37
complete to predict to generate new new
15:39
characters
15:41
to UB pass the input
15:43
it
15:44
transform that text into a sequence of
15:46
character then this sequence of
15:48
characters
15:49
into a sequence of int using the ID from
15:52
Cars layer we have set up previously and
15:56
then we call our model our encoder
15:58
decoder model that has been previously
16:00
trained
16:01
and what does it do it takes this input
16:04
of ins
16:05
and
16:07
output the predicted logic so the scores
16:10
for the most probable token the most
16:13
probable character in this case along
16:14
with a state that summarizes what has
16:17
been seen previously
16:20
from the predicted logins we can compute
16:23
um
16:24
we can select the the most likely tokens
16:30
or characters but before doing that here
16:32
there is a little bit of a trick which
16:35
is that we divide the logits by a
16:37
temperature by a number
16:39
so basically if the temperature is one
16:42
nothing happens but if the temperature
16:44
is very high
16:49
what it will do it will makes that the
16:52
scores Associated to each of the token
16:55
to be predicted next will be relatively
16:59
similar close to zero
17:02
this means that actually this token will
17:05
be
17:06
more and more likely to be chosen so
17:09
there will be more variety more
17:12
more staff can be predicted if the
17:15
temperature is high so it's a bit more
17:17
creative if you have a too high
17:20
temperature of course the the neural
17:23
network we just predict the gibberish
17:26
okay
17:28
and if you have a true temperature the
17:30
the highest probability score will be
17:32
just multiplied by a very large number
17:34
because it's divided by a small number
17:37
a number between 0 and 1 which means
17:40
that the highest score will be
17:44
become much much bigger than the other
17:47
scores
17:48
giving a much higher chance to be
17:51
selected which gives you more of the
17:54
deterministic behavior okay that's the
17:57
temperature that's an important
17:58
parameters in this type of architecture
18:01
okay and that's what it does
18:03
okay so now we have the predicted logits
18:06
we use TF random category calendar just
18:09
sample from this probability scores
18:13
the most likely IDs to be next
18:18
we transform that that back to a
18:21
character and that's what we return
18:23
okay so that's essentially what the
18:25
decoding function does and most decoding
18:28
function at the the very same structure
18:30
there is also this temperature trick
18:32
that you can see has a as a parameters
18:36
in the case of large English models
18:41
okay so let's use our our decoding
18:44
function so typically you use that in a
18:46
loop so here we are going to predict
18:48
1000 characters by repeatedly making a
18:52
call to the decoding function generated
18:55
one step to which you feed
19:00
what has been predicted before along
19:04
with the state summarizing what happened
19:07
before and it predict the next character
19:10
along with the new state and we start
19:12
the the process with a sort of a prompt
19:15
here that's
19:16
Romeo what are you gonna say and then
19:21
the neural net let's see what the neural
19:23
net generate remote says no good coriola
19:26
had leaves take her fiddle and if I seem
19:30
to my love you
19:32
so you see it's not it doesn't make a
19:35
lot of sense here remember I've trained
19:38
it only a few minutes
19:41
on a work workbench AI work vertex AI
19:45
workbench
19:47
which are great by the way but it here
19:50
that's a small instance with just one
19:52
GPU so it was a very small training the
19:55
model
19:57
is written in a few lines but yet you
20:00
still see that it can really pick up a
20:03
lot of things in the structure of the of
20:07
the input data it detects us patterns
20:09
that you have characters so Romero that
20:13
was our input
20:15
but then leontes was generated by the
20:18
network and then what plantes
20:20
says also
20:23
okay
20:25
that's it
20:26
if you like this presentation you'll
20:29
find morein or ASL GitHub repository
20:31
with 90 plus machine learning and
20:33
notebooks don't forget if you find it
20:35
useful please start repo thanks for your
20:38
time
生成式AI学习7——注意力机制
宝玉的技术分享
4.55K subscribers
Subscribed
3
Share
Download
Clip
Save
224 views Jun 25, 2023 生成式AI学习
本课程将向你介绍注意力机制,这是一种强大的技术,使神经网络能够关注输入序列的特定部分。你将学习注意力的工作原理,以及如何利用它来提高各种机器学习任务的性能,包括机器翻译、文本总结和问题回答。
课程地址:https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
9+
Avatar image
0:00 / 11:38
生成式AI学习7——注意力机制
宝玉的技术分享
4.55K subscribers
Subscribed
3
Share
Download
Clip
Save
224 views Jun 25, 2023 生成式AI学习
本课程将向你介绍注意力机制,这是一种强大的技术,使神经网络能够关注输入序列的特定部分。你将学习注意力的工作原理,以及如何利用它来提高各种机器学习任务的性能,包括机器翻译、文本总结和问题回答。
课程地址:https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
宝玉的技术分享
4.55K subscribers
Videos
About
Transcript
0:00
hi I'm sanjana Reddy a machine learning
0:03
engineer at Google's Advanced Solutions
0:05
lab
0:06
there's a lot of excitement currently
0:07
around generative Ai and new
0:10
advancements including new vertex AI
0:12
features such as gen AI Studio model
0:15
Garden gen AI API our objective in this
0:19
short session is to give you a solid
0:22
footing on some of the underlying
0:24
Concepts that make all the Gen AI magic
0:26
possible
0:28
today I'm going to talk about the
0:30
attention mechanism that is behind all
0:33
the Transformer models and which is core
0:36
to the llm models
0:38
let's say you want to translate an
0:40
English sentence the cat ate the mouse
0:42
to French
0:44
you could use an encoder decoder this is
0:48
a popular model that is used to
0:50
translate sentences the encoder decoder
0:53
takes one word at a time and translates
0:56
it at each time step however sometimes
0:59
the words in the source language do not
1:02
align with the words in the target
1:04
language here's an example take the
1:07
sentence black cat ate the mouse
1:10
in this example the first English word
1:13
is black
1:14
however in the translation the first
1:17
French word is she which means cat in
1:21
English
1:21
so how can you train a model to focus
1:24
more on the word cat instead of the word
1:27
black at the first time step
1:30
to improve the translation you can add
1:32
what is called the attention mechanism
1:34
to the encoder decoder
1:37
attention mechanism is a technique that
1:40
allows the neural network to focus on
1:42
specific parts of an input sequence
1:45
this is done by assigning weights to
1:48
different parts of the input sequence
1:50
with the most important parts receiving
1:53
the highest weights
1:55
this is what a traditional RNN based
1:58
encoder decoder looks like the model
2:01
takes one word at a time as input
2:03
updates the hidden State and passes it
2:07
on to the next time step
2:09
in the end only the final hidden state
2:13
is passed on to the decoder
2:15
the decoder works with the final hidden
2:18
state for processing and translates it
2:21
to the target language
2:24
an attention model differs from the
2:27
traditional sequence to sequence model
2:28
in two ways first the encoder passes a
2:33
lot more data to the decoder
2:35
so instead of just passing the final
2:37
hidden state number three to the decoder
2:40
the encoder passes all the hidden States
2:43
from each time step
2:45
this gives the decoder more context
2:48
Beyond just the final hidden state
2:51
the decoder uses all the hidden State
2:54
information to translate the sentence
2:56
the second change that the attention
2:58
mechanism brings is adding an extra step
3:02
to the attention decoder before
3:04
producing its output
3:06
let's take a look at what these steps
3:08
are
3:10
to focus only on the most relevant parts
3:12
of the input the decoder does The
3:15
Following
3:17
first it looks at the set of encoder
3:20
states that it has received
3:22
each encoder hidden state is associated
3:25
with a certain word in the input
3:27
sentence second it gives each hidden
3:30
State a score
3:32
third it multiplies each hidden state by
3:35
its soft Max score as shown here
3:38
thus amplifying hidden states with the
3:42
highest scores
3:43
and downsizing hidden states with low
3:45
scores
3:47
if we connect all of these pieces
3:49
together we're going to see how the
3:51
attention network works
3:53
before moving on let's define some of
3:55
the notations on this slide
3:57
Alpha here represents the attention
4:00
weight at each time step H represents
4:03
the hidden state of the encoder RNN at
4:06
each time step
4:08
H subscript D represents the hidden
4:11
state of the decoder RNN at each time
4:14
step
4:15
with the attention mechanism the
4:17
inversion of the black cat translation
4:19
is clearly visible in the attention
4:22
diagram
4:23
an 8 translates as two words homage and
4:27
French we can see the attention Network
4:30
staying focused on the word eight for
4:33
two time steps
4:34
during the attention step we use the
4:37
encoder hidden States and the H4 Vector
4:40
to calculate a context Vector A4 for
4:44
this time step this is the weighted sum
4:48
we then concatenate H4 and A4 into one
4:52
vector
4:53
this concatenated Vector is passed
4:56
through a feed forward neural network
4:58
one trained jointly with the model to
5:01
predict the next word
5:04
the output of the feed forward neural
5:06
network indicates the output word of
5:09
this time step this process continues
5:12
till the end of sentence token is
5:15
generated by the decoder this is how you
5:18
can use an attention mechanism to
5:20
improve the performance of a traditional
5:22
encoder decoder architecture
5:25
thank you so much for listening
生成式AI学习
宝玉的技术分享
7 / 12
宝玉的技术分享
4.55K subscribers
Videos
About
0 Comments
rongmaw lin
Add a comment...
Search
9+
Avatar image
0:08 / 11:37
生成式AI学习8——Transformer模型和BERT模型(上)概述
宝玉的技术分享
4.55K subscribers
Subscribed
7
Share
Download
Clip
Save
532 views Jun 25, 2023 生成式AI学习
Transformer模型和BERT模型
本课程向您介绍Transformer架构和来自Transformer(BERT)的双向编码器表示法模型。您将学习Transformer架构的主要组件,如自我关注机制,以及如何使用它来构建BERT模型。你还会了解到BERT可用于的不同任务,如文本分类、问题回答和自然语言推理。
在本课程中,您将了解到Transformer架构的主要组成部分,如自我关注机制,以及如何使用它来构建BERT模型。你还会了解到BERT可用于的不同任务,如文本分类、问题回答和自然语言推理。
课程地址:https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
Transcript
0:00
hi I'm sanjana Reddy a machine learning
0:03
engineer at Google's Advanced Solutions
0:05
lab
0:06
there's been a lot of excitement around
0:08
generative Ai and all the new
0:10
advancements including new vertex AI
0:12
features that are coming up such as gen
0:15
AI Studio model Garden genei API
0:20
our objective in this short session is
0:23
to give you a solid footing on some of
0:25
the underlying Concepts that make all
0:28
the Gen AI magic possible
0:30
today I'm going to talk about
0:33
Transformer models and the Bert model
0:37
language modeling has evolved over the
0:39
years
0:40
the recent breakthroughs in the past 10
0:42
years include the usage of neural
0:45
networks to represent text such as word
0:48
to whack and engrams in 2013.
0:51
in 2014 the development of sequence to
0:55
sequence models such as rnns and lstms
0:58
helped improve the performance of ml
1:01
models on NLP tasks such as translation
1:04
and text classification
1:07
in 2015 the excitement came with
1:10
attention mechanisms and the models
1:13
built based on it such as Transformers
1:15
and the bird model
1:17
in this presentation we'll focus on
1:20
Transformers
1:22
Transformers is based on a 2017 paper
1:25
named attention is all you need
1:29
although all the models before
1:31
Transformers were able to represent
1:33
words as vectors these vectors did not
1:37
contain the context
1:39
and the usage of words changes based on
1:42
the context for example Bank in
1:45
Riverbank versus Bank in bank robber
1:48
might have the same Vector
1:50
representation before attention
1:53
mechanisms came about a Transformer is
1:56
an encoder decoder model that uses the
1:59
attention mechanism
2:01
it can take advantage of parallelization
2:04
and also process a large amount of data
2:07
at the same time because of its model
2:10
architecture
2:11
attention mechanism helps improve the
2:14
performance of machine translation
2:16
applications
2:17
Transformer models were built using
2:20
attention mechanisms at the core
2:23
a Transformer model consists of encoder
2:27
and decoder
2:28
the encoder encodes the input sequence
2:31
and passes it to the decoder and the
2:34
decoder
2:35
decodes a representation for irrelevant
2:38
task
2:40
the encoding component is a stack of
2:42
encoders of the same number the research
2:46
paper that introduced Transformers
2:47
Stacks 6 encoders on top of each other
2:51
six is not a magical number it's just a
2:54
hyper parameter
2:56
the encoders are all identical in
2:58
structure but with different weights
3:01
each encoder can be broken down into two
3:04
sub-layers
3:05
the first layer is called the
3:07
self-attention
3:09
the input of the encoder first flows
3:11
through a self-attention layer which
3:14
helps the encoder look at relevant parts
3:16
of the words as it encodes a center word
3:19
in the input sentence
3:22
and the second layer is called a feed
3:24
forward layer the output of the
3:27
self-attention layer is fed to the feed
3:29
forward neural network
3:31
the exact same feed forward neural
3:33
network is independently applied to each
3:36
position
3:38
the decoder has both the self-attention
3:41
and the feed forward layer but between
3:44
them is the encoder decoder attention
3:46
layer that helps the decoder focus on
3:49
relevant parts of the input sentence
3:53
after embedding the words in the input
3:55
sequence each of the embedding Vector
3:58
flows through the two layers of the
3:59
encoder
4:01
the word at each position passes through
4:04
a self-attention process then it passes
4:07
through a feed-forward neural network
4:09
the exact same network with each Vector
4:12
flowing through it separately
4:15
dependencies exist between these paths
4:18
in the self-attension layer
4:21
however the feed forward layer does not
4:24
have these dependencies and therefore
4:26
various paths can be executed in
4:29
parallel while they flow through the
4:32
feed forward layer
4:34
in the self-attention layer the input
4:36
embedding is broken up into query key
4:40
and value vectors
4:42
these vectors are computed using weights
4:45
that the Transformer learns during the
4:47
training process
4:49
all of these computations happen in
4:52
parallel in the model in the form of
4:54
Matrix computations
4:56
once we have the query key and value
4:58
vectors the next step is to multiply
5:01
each value vector by the softmax score
5:05
in preparation to sum them up the
5:08
intuition here is to keep intact the
5:11
values of the words you want to focus on
5:13
and leave out irrelevant words by
5:16
multiplying them by tiny numbers like
5:19
0.001 for example
5:22
next we have to sum up the weighted
5:25
value vectors
5:26
which produces the output of the
5:28
self-attention layer at this position
5:30
for the first word you can send along
5:33
the resulting Vector to the feedforward
5:36
neural network
5:38
to sum up this process of getting the
5:41
final embeddings these are the steps
5:43
that we take
5:44
we start with the natural language
5:46
sentence
5:48
embed each word in the sentence
5:51
after that we perform multi-headed
5:54
attention eight times in this case and
5:58
multiply this embedded word with the
6:00
respective weighted matrices
6:02
we then calculate the attention using
6:05
the resulting qkv matrices
6:09
finally we concatenate the matrices to
6:13
produce the output Matrix which is the
6:16
same Dimension as the final Matrix that
6:19
this layer initially got
6:22
there's multiple variations of
6:24
Transformers out there now
6:26
some use both the encoder and the
6:28
decoder component from the original
6:30
architecture some use only the encoder
6:33
and some use only the decoder
6:36
a popular encoder only architecture is
6:39
Bert
6:40
Bert is one of the trained Transformer
6:43
models Bert stands for bi-directional
6:46
encoder representations from
6:48
Transformers and was developed by Google
6:51
in 2018.
6:54
since then multiple variations of bird
6:57
have been built today Bert Powers Google
7:00
search
7:01
you can see how different the results
7:03
provided by Bert are for the same search
7:06
query before and after
7:09
it was trained in two variations one
7:13
model contains bird base which had 12
7:16
stack of Transformers with approximately
7:19
110 million parameters and the other
7:23
bird large with 24 layers of
7:26
Transformers with about 340 million
7:28
parameters
7:30
the bird model is powerful because it
7:33
can handle long input context
7:36
it was trained on the entire Wikipedia
7:38
Corpus and Books Corpus
7:41
the bird model was trained for 1 million
7:43
steps Bert is trained on different tasks
7:47
which means it has multi-task objective
7:49
this makes Bert very powerful
7:53
because of the kind of tasks it was
7:55
trained on it works at both a sentence
7:57
level and at a token level
8:00
these are the two different versions of
8:03
Bert that were originally released one
8:05
is bird base which had 12 layers whereas
8:09
bird large had 24 layers and compared to
8:12
the original Transformer which had six
8:14
layers
8:16
the way that Bert works is that it was
8:19
trained on two different tasks task one
8:22
is called a masked language model where
8:25
the sentences are masked and the model
8:27
is trained to predict the Mast words
8:31
if you were to train bird from scratch
8:33
you would have to mask a certain
8:35
percentage of the words in your Corpus
8:38
the recommended percentage for masking
8:40
is 15 percent
8:42
the masking percentage achieves a
8:45
balance between too little and too much
8:47
masking do little masking makes the
8:50
training process extremely expensive and
8:53
too much masking removes the contacts
8:56
that the model requires
8:58
the second task is to predict the next
9:01
sentence
9:02
for example the model is given two sets
9:05
of sentences Bert aims to learn the
9:08
relationships between sentences and
9:10
predict the next sentence given the
9:12
first one
9:13
for example sentence a could be a man
9:17
went to the store and sentence B is he
9:20
bought a gallon of milk
9:22
bird is responsible for classifying if
9:25
sentence B is the next sentence after
9:28
sentence a this is a binary
9:31
classification task
9:33
this helps bird perform at a sentence
9:36
level
9:37
in order to train Bert you need to feed
9:40
three different kinds of embeddings to
9:42
the model
9:43
for the input sentence you get three
9:46
different embeddings token segment and
9:49
position embeddings
9:51
the token embeddings is a representation
9:54
of each token as an embedding in the
9:57
input sentence
9:59
the words are transformed into Vector
10:02
representations of certain dimensions
10:05
bird can solve NLP tasks that involve
10:08
text classification as well
10:10
an example is to classify whether two
10:13
sentences say my dog is cute and he
10:16
likes playing are semantically similar
10:18
the pairs of input texts are simply
10:21
concatenated and fed into the model how
10:24
does bird distinguish the input in a
10:27
given pair the answer is to use segment
10:30
embeddings
10:31
there is a special token represented by
10:34
SCP that separates the two different
10:37
splits of the sentence
10:39
another problem is to learn the order of
10:42
the words in the sentence
10:44
as you know bird consists of a stack of
10:47
Transformers Bert is designed to process
10:50
input sequences up to a length of 512.
10:54
the order of the input sequence is
10:57
incorporated into the position
10:59
embeddings this allows Bert to learn a
11:02
vector representation for each position
11:06
vert can be used for different
11:08
Downstream tasks although Bert was
11:11
trained on mass language modeling and
11:14
single sentence classification it can be
11:17
used for popular NLP tasks like single
11:20
sentence classification sentence pair
11:23
classification question answering and
11:26
single sentence tagging tasks
11:28
thank you for listening
生成式AI学习9——Transformer模型和BERT模型(下)演示
宝玉的技术分享
4.55K subscribers
Subscribed
2
Share
Download
Clip
Save
194 views Jun 26, 2023 生成式AI学习
Transformer模型和BERT模型
本课程向您介绍Transformer架构和来自Transformer(BERT)的双向编码器表示法模型。您将学习Transformer架构的主要组件,如自我关注机制,以及如何使用它来构建BERT模型。你还会了解到BERT可用于的不同任务,如文本分类、问题回答和自然语言推理。
在本课程中,您将了解到Transformer架构的主要组成部分,如自我关注机制,以及如何使用它来构建BERT模型。你还会了解到BERT可用于的不同任务,如文本分类、问题回答和自然语言推理。
课程地址:https://www.cloudskillsboost.google/c...
Featured playlist
12 videos
生成式AI学习
宝玉的技术分享
Transcript
Follow along using the transcript.
Show transcript
0:00
hi I'm sanjana Reddy a machine learning
0:03
engineer at Google's Advanced Solutions
0:05
lab
0:06
welcome to the lab walkthrough for
0:09
Transformer models and Bird model
0:12
in this lab walkthrough we'll be going
0:14
through classification using a
0:16
pre-trained Bert model you'll find the
0:19
setup instruction in our GitHub
0:21
repository
0:22
let's get started
0:24
in order to work on this notebook you'll
0:26
need to log into Google Cloud go into
0:29
vertex AI
0:31
and click on workbench
0:33
make sure that you have a notebook
0:35
created once the notebook instance has
0:37
been created click on open Jupiter lab
0:41
once you've followed the instructions in
0:44
our GitHub repository navigate to
0:46
classify text with Bert
0:48
in this notebook we're going to learn
0:51
how to load a pre-trained Bert model
0:53
from tensorflow hub
0:55
and build our own classification using
0:58
the pre-trained bird model we learn how
1:00
to train a bird model by fine tuning it
1:04
before you get started note that this
1:07
notebook requires a GPU because the
1:10
training does take quite a bit of time
1:14
when you open this notebook there is a
1:16
setup instruction in order to set up a
1:18
bird kernel to install all the required
1:21
libraries for this notebook
1:24
for this notebook we're going to be
1:26
using tensorflow and tensorflow hub
1:29
tensorflow text which is required to
1:32
pre-process the input for the bird model
1:36
you can see that I'm checking if a GPU
1:38
is attached and I see that I have one
1:40
GPU attached to this notebook
1:43
in this notebook we're going to train a
1:46
sentiment analysis model to classify
1:48
movie reviews as either being positive
1:51
or negative based on the text of the
1:54
review
1:55
we're going to be working with the IMDb
1:57
data set that you can download from this
2:00
URL
2:03
once we have downloaded the data set we
2:07
can examine the data to see what's in it
2:09
we see that we have 25
2:12
000 files that belong to two classes
2:14
positive and negative
2:16
and we're going to be using 20 000 files
2:19
for training and the remaining 5000 for
2:22
testing
2:24
a sample of this data set shows you the
2:27
movie review over here and an Associated
2:31
label so for the one over here we see
2:34
that the label that is associated is
2:36
negative
2:37
and the one below here it's positive
2:42
once we've examined our data set and
2:45
we're happy with it we're going to load
2:47
a pre-trained Bert model from tensorflow
2:49
hub
2:51
tensorflow Hub offers multiple different
2:54
variations of birth models in all
2:56
different sizes we're going to use a
2:59
small bird for today's notebook
3:02
so this bird model has four different
3:05
layers it has 512 hidden units and it
3:09
has eight attention heads
3:11
for every bird model that we load from
3:14
tensorflow Hub it is associated with a
3:18
corresponding pre-processing model
3:20
you can find the corresponding
3:22
pre-processing model on tensorflow HUB
3:24
as well
3:28
we're going to examine the
3:29
pre-processing model so we have we're
3:32
going to load the pre-processing model
3:33
we see in the previous step and we pass
3:36
a sample text over here so we just
3:39
passed this is an amazing movie and
3:41
we're going to examine the output the
3:43
pre-processing model gives us multiple
3:46
outputs the first is the input vert ID
3:50
the input vert ID is the ID of the words
3:53
in the tokenized sentence
3:56
the pre-processing model also provides
3:59
us the masking for each word
4:02
every sentence is converted into a fixed
4:06
length input and it masks words that are
4:09
not valid
4:14
so once we have pre-processed our input
4:16
text we can use the loaded Bird model
4:19
from tensorflow hub
4:21
in this particular cell block it doesn't
4:24
really make any sense because we've not
4:26
trained the model this is just a random
4:28
list of numbers at this point but once
4:31
you pass the pre-process text into this
4:33
bird model you get certain embeddings
4:41
so in order to Define our classification
4:43
model we start with an input layer
4:47
the input layer takes the raw text as
4:50
input passes it on to the pre-processing
4:53
layer for pre-processing that converts
4:55
it into token IDs word IDs and mask IDs
4:59
the pre-processed words are then passed
5:02
to the pre-trained model
5:05
there is an argument here called
5:07
trainable trainable here determines if
5:10
you want to fine-tune the pre-trained
5:12
model
5:13
using the new data that you're training
5:15
with or not in our example we are
5:18
setting trainable to true
5:20
which means that we're going to update
5:22
the initial weights of the pre-trained
5:24
model
5:26
your decision to set this to true or
5:28
false depends on two things
5:31
whether you want to update the weights
5:34
and second on the size of your data set
5:36
if you have a relatively small data set
5:39
it is recommended to set this to false
5:42
so that you're not introducing noise
5:44
into the pre-trained weights but if you
5:47
have a large enough data set
5:49
you can set this to true
5:52
once we have our pre-trained model we
5:54
pass it through a dense layer
5:57
to get probabilities for each of our
5:59
classes
6:02
this is what the output from the model
6:04
is going to look like the output is
6:06
going to be a probability of whether
6:08
this particular sentence is true is
6:11
positive or negative
6:15
since we're working with a binary
6:17
classification problem we're going to
6:19
use binary cross entropy as our loss
6:22
function and the metric to optimize for
6:25
is going to be binary accuracy
6:28
your initializing our training by
6:31
defining the optimizer in our case we're
6:34
using atom which is a popular choice for
6:37
neural network models
6:40
once we've initialized the training we
6:43
can start training using model.fit we
6:46
can pass the train data set and the
6:48
validation data set and the number of
6:50
epochs that we want to train for
6:55
once a model has strained let's evaluate
6:57
the performance of the model so in our
7:00
case we see that the model achieved an
7:02
accuracy of 85 percent which is pretty
7:05
decent considering we only trained it
7:07
for five epochs
7:09
you can potentially plot the accuracy
7:13
and loss over time in order to visualize
7:15
the model's performance
7:17
we see that the training loss is going
7:20
down and our we could work on our
7:23
validation loss a little bit but for the
7:26
sake of demonstration we've only trained
7:28
it for five epochs
7:31
once you're satisfied with the model
7:33
that you've trained you can save the
7:35
model using
7:37
model.save model.save exports your
7:40
tensorflow model to a local path so the
7:43
export path in this line is going to be
7:47
a path in your notebook instance
7:51
once you've saved your model you can
7:53
load it to get predictions so in this
7:56
example we have this is such an amazing
7:59
movie this movie was great the movie was
8:02
okay-ish the movie was terrible and we
8:05
get predictions for each of these
8:06
sentences based on the model that we
8:09
have trained
8:11
if you would like to take this further
8:13
and deploy your model on vertex AI to
8:16
get online predictions you could take
8:18
the locally saved model and Export it to
8:22
vertex AI
8:25
in order to do this you need to check
8:27
the signature of the model to see how
8:30
you can pass predictions to the model
8:32
the signature of the model shows you
8:34
what is the first layer that is taking
8:37
input
8:40
so once we have the locally saved model
8:43
we are going to push the model to
8:46
vertex's model registry
8:49
using these commands
8:52
in order to put the model in vertex's
8:55
model registry you need to ensure that
8:57
you have a Google Cloud Storage bucket
8:59
and these lines over here let you create
9:02
a bucket if it doesn't already exist
9:05
we're going to copy the locally saved
9:07
model using GS utilcp which takes the
9:11
locally saved model from the export path
9:13
and puts it in the Google Cloud Storage
9:16
bucket
9:17
once the model is in the Google Cloud
9:19
Storage bucket we're going to upload it
9:22
to vertex ai's model registry
9:24
we're using the python SDK in this case
9:27
so we have ai
9:30
platform.model.upload which takes the
9:32
model from Google Cloud Storage bucket
9:34
and puts it in the model registry
9:38
once the model has been uploaded we are
9:41
ready to deploy the model on vertex and
9:44
get online predictions
9:47
in order to do this we can use the
9:49
python SDK again so we can use
9:53
uploadedmodel.deploy which is a function
9:55
that is going to do two things one it's
9:57
going to create an endpoint
10:00
and two it's going to upload the model
10:02
to this particular endpoint
10:05
so you can see here that it's creating
10:07
the endpoint providing you the endpoint
10:09
location and then once the endpoint has
10:12
been created the model is deployed to
10:15
this endpoint
10:16
this step is going to take around 5 to
10:19
10 minutes when you run through your
10:21
notebook so just don't worry if it takes
10:24
too long
10:25
once the model has been deployed to the
10:28
endpoint you're ready to get predictions
10:30
from this endpoint
10:32
so you can create an instance object
10:36
so using the signature of the model we
10:38
know that the name of the first input
10:40
layer is text so we're going to pass our
10:44
review text to this particular key
10:48
we create these instances object that is
10:51
going to be passed to the
10:52
endpoint.predict function
10:54
and the endpoint our predict function is
10:56
going to take this instance and it's
10:58
going to give us predictions so we can
11:00
see for our first instance I love the
11:03
movie and highly recommend it we have a
11:05
prediction of 0.99 for it was an okay
11:09
movie in my opinion we have 84 percent
11:11
and for I hated the movie we have two
11:14
percent which means it's a negative
11:15
sentiment
11:17
so this is how you can create a
11:19
classification model from a pre-trained
11:22
birth model and then deployed on vertex
11:24
to get online predictions
宝玉的技术分享
4.55K subscrib
生成式AI学习10——创建图像字幕模型(上)概述
宝玉的技术分享
4.55K subscribers
Subscribed
2
Share
Download
Clip
Save
161 views Jun 26, 2023 生成式AI学习
10. Create Image Captioning Models | 创建图像字幕模型(上)概述
Transcript
0:01
hi everyone I am takumi machine learning
0:04
engineer at Google advanced Solutions
0:06
lab
0:09
currently a lot of people are talking
0:11
about generative Ai and its new
0:13
advancement
0:15
and as some of you may know Google and
0:18
Google Cloud also release so many
0:20
generative AI related new products and
0:24
features
0:26
but in this video series our goal is not
0:29
to create a state of art generative AIS
0:33
nor to introduce Google Cloud new
0:35
products
0:37
instead we will explain what kind of
0:40
technology is working behind them
0:43
and especially in this video I'm going
0:46
to talk about how to actually create a
0:49
very simple generative model image
0:51
captioning model
0:53
by using a Technologies like encoder
0:56
decoder attention mechanism and a bit
0:59
Transformer
1:01
if you're not very familiar with this
1:03
Concepts I recommend checking other
1:05
videos talking about them before this
1:10
okay so if you're ready let's talk about
1:13
image captioning tasks and data set at
1:16
first
1:18
we're going to use this kind of data set
1:21
as you can see there are a lot of pairs
1:23
of images and Text data
1:26
and our goal is to build and train a
1:28
model that can generate these kind of
1:31
text captions based on images
1:34
and we'll make it happen by building
1:36
this kind of model
1:38
as you can see it is kind of encoder
1:41
decoder model
1:42
but in this case encoder and decoder
1:45
handle different modality of data which
1:48
is image and text
1:50
so we pass images to encoder at first
1:54
and it extracts information from the
1:56
images and create some feature vectors
2:00
and then the vectors are passed to the
2:03
decoder which actually build captions by
2:07
generating was one by one
2:09
so this encoder part is easy you can use
2:12
any kinds of image backbone like resnet
2:16
efficient net or Vision Transformer
2:20
what we want to do here is to extract
2:22
features by using back on the backbones
2:25
so call is very simple too
2:28
in terms of the code we're going to see
2:30
the entire notebook example in the next
2:32
video so here let's just focus on some
2:36
important lines
2:38
as you can see we are using classical
2:41
Inception resident V2 here
2:44
from Keras applications
2:47
but again this can be any image
2:49
backbones
2:52
so the next part the decoder is a bit
2:54
complex
2:56
so let's take a look very carefully
3:00
so this is the entire architecture of
3:02
the decoder
3:04
it gets was one by one and makes the
3:07
information of wars and images which is
3:10
coming from the encoder output
3:14
and try to predict the next Wars
3:17
so this decoder itself is in kind of
3:20
iterative operation so by calling it
3:23
again and again also regressively we can
3:26
eventually generate text captions
3:29
there are so many variations for this
3:32
decoder design but here we will
3:34
Transformer like architecture although
3:37
we still use RNN or Gru
3:40
so let's zoom into each component
3:45
the first embedding layer creates word
3:47
embeddings which was discussed in other
3:50
videos
3:51
and we are passing it to GRU layer
3:56
if you forgot what gru is it's a
3:58
variation of recurrent neural network or
4:01
you can call RNN
4:03
RNN takes inputs and updates its
4:06
internal States and generate output
4:09
so by passing sequential data like text
4:12
Data it keeps the sequential
4:14
dependencies from previous inputs like
4:16
previous words
4:18
the gru output goes to attention layer
4:22
which mixes the information of text Send
4:24
image
4:26
in tensorflow Keras we can use
4:28
predefined layers in the same way as
4:30
other layers
4:32
there are multiple implementations but
4:34
here we simply use TF Keras layers
4:37
attention
4:39
but if you want to use more Transformer
4:41
like architecture
4:43
you can also the pickup TF Keras layers
4:46
modes head attention which uses multiple
4:49
attention heads
4:51
you can simply switch and use it in
4:54
almost the same way
4:56
inside attention layer it looks like
4:58
this
4:59
as you may have already seen in another
5:01
video about attention mechanism
5:04
but unique thing here is it pays
5:07
attention to image feature from Text
5:09
data
5:11
by doing so it can calculate the
5:14
attention score by mixing both
5:16
information
5:18
going back to code you can see this
5:21
attention layer takes two inputs
5:24
GRE output and encoder output
5:28
internally GRE output is used as
5:31
attention query and key
5:34
and encoder output as value
5:38
the last components are add layer and
5:41
layer normalization layer
5:44
add layer just adds two same shift
5:48
vectors
5:50
as you can see here Gru output is passed
5:53
to attention layer as we discussed and
5:56
to this add layer directly
5:59
these two flows are eventually merged in
6:02
this other layer
6:04
this kind of Architecture is called skip
6:07
connection
6:09
which has been a very popular deep
6:11
neural network design pattern since
6:13
resnet
6:15
so it is also called residual connection
6:20
this skip connection is very useful
6:23
especially when you want to design a
6:26
very deep neural network
6:28
and it is also used in the Transformer
6:32
with this now we could build entire
6:35
decoder
6:36
so we are ready to train the encoder
6:38
decoder image captioning model using the
6:41
captioning data set
6:43
we will see how it works in the next
6:45
video
6:47
but before moving on to the next one the
6:50
I want to explain a bit more about
6:52
inference phase where we can actually
6:55
generate captions for original images
6:58
because this process may look a bit
7:00
complex
7:02
here you can see three steps and we're
7:05
gonna Implement these steps in a custom
7:07
inference function
7:10
the number one and generate the gru
7:12
initial State and create a start token
7:16
in training phase tensorflow Keras can
7:19
automatically handle Gru state for each
7:22
sequence
7:23
but in this inference phase since we
7:26
design our own custom function we need
7:28
to write a logic to deal with it
7:30
explicitly
7:33
so at the beginning of each captioning
7:35
we explicitly initialize the gru state
7:38
with some value
7:41
and at the same time remember our
7:44
decoder is an auto regressive function
7:47
but since we haven't get any water
7:49
prediction yet at the beginning of the
7:51
inference
7:53
we pass start token which is a special
7:56
token that means the beginning of a
7:58
sentence
8:00
number two pass an input image to the
8:03
encoder and extract the feature Vector
8:05
as we discussed and number three pass
8:09
the vector to
8:11
this time decoder and generate a caption
8:14
word in the for Loop until it returns n
8:18
token or it reads to Max caption links
8:21
which is just a hyper parameter
8:23
specifying some number like the 64 and
8:28
in this for Loop we Define all the
8:30
procedures of caption Generation by
8:33
calling the decoder also regressively
8:37
end token is a special token 2 which
8:40
means the end of the sequence
8:42
so when our decoder generate this token
8:45
we can finish this for Loop
8:48
or you can go out of the loop when the
8:50
links of the caption reach its some
8:52
number Max caption links
8:56
let's take a look at the code one by one
8:59
in the first step we initialize two
9:02
things zero State and start token
9:06
in this case Gru state is simply
9:08
initialized with zero vectors
9:11
and we set start tokens as the first
9:14
input Ward for the decoder
9:18
in terms of the war to index function
9:20
used here I'm going to explain in the
9:22
next video
9:24
but basically it's just tokenized the
9:26
words to War token which is the standard
9:29
text pre-processing technique
9:31
in the next step we pre-process the
9:34
input image a bit and pass it to the
9:36
encoder we train
9:39
in term result the image pre-processing
9:42
it reason the code jpeg in the first
9:44
line
9:45
and it resize it
9:47
from any arbitrary resolutions to
9:50
specific resolution
9:53
and it changes the scale from 0 to 255
9:58
to 0 to 1 in the third line
10:01
and the last phase decoder for Loop
10:05
it is a bit complex so I will explain in
10:07
the next video more in detail with the
10:10
actual code
10:11
but the main thing here is to call the
10:14
decoder by passing the three things
10:17
Deco inputs means decoder input which
10:21
should have a War token predicted in the
10:24
previous iteration
10:26
and as we talked if it is the first
10:28
iteration this would be the star token
10:34
jario state is the current GRE State we
10:36
discussed
10:37
and please note that the decoder or
10:40
output the updated GRS state
10:45
and last but not least features
10:48
this is the image feature we extracted
10:50
with the encoder
10:52
bypassing them we can get the actual
10:55
next word prediction
10:58
this is a very simple text generation
11:01
model from images
11:03
about this kind of iteration is very
11:06
similar even in very large longer
11:09
generation models like Google bard
11:13
they basically predict the next word
11:15
also regressively in this way one by one
11:18
based on some information and land
11:21
knowledge which is embedded in a huge
11:24
number of parameters
11:27
in the next video I will walk you
11:29
through the entire notebook
11:31
and then we will check what kind of
11:33
captions this model can generate
11:37
thank you so much for watching and see
11:40
you in the next video
生成式AI学习
宝玉的技术分享
10 / 12
本课程教你如何通过使用深度学习创建一个图像字幕模型。你会了解到图像字幕模型的不同组成部分,如编码器和解码器,以及如何训练和评估你的模型。在本课程结束时,你将能够创建你自己的图像标题模型,并使用它们来生成图像的标题。
创建图像标题模型: 概述
本模块教你如何使用深度学习来创建一个图像字幕模型。你会了解到一个图像字幕模型的不同组成部分,如编码器和解码器,以及如何训练和评估你的模型。在本模块结束时,你将能够创建自己的图像标题模型,并使用它们来生成图像的标题。
生成式AI学习11——创建图像字幕模型(下)演练
宝玉的技术分享
4.55K subscribers
Subscribed
1
Share
Download
Clip
Save
109 views Jun 27, 2023 生成式AI学习
11. Create Image Captioning Models | 创建图像字幕模型(下)演练
Transcript
0:00
hi everyone I'm takumi machine learning
0:03
engineer at the Google advanced
0:05
Solutions lab
0:06
this is the second half of the image
0:08
captioning section if you haven't seen
0:11
the first half I recommend checking it
0:13
at first
0:15
and in this video I'm going to walk you
0:17
through the entire code notebook to help
0:20
you understand how to create a very
0:22
simple generative model
0:25
all the setup information is written in
0:28
the ASL GitHub repository
0:30
you can find the link in the slide or in
0:34
the description below this video
0:37
after setting up the vertex AI walkbench
0:40
environment and cloning the repo
0:42
following the instruction
0:44
you can find the image captioning
0:46
notebook under
0:48
ASL ml immersion
0:50
notebooks
0:52
and much model
0:55
Solutions
0:57
here you go you can find image
0:59
captioning dot IPython notebook so
1:02
please open this file
1:06
and here you can see all the process and
1:09
instructions to build and use an image
1:12
captioning model which we discussed in
1:13
the previous video
1:15
let's take a look from the first cell
1:19
in the first cell of course we install
1:22
all the dependencies including
1:24
tensorflow Keras
1:27
and here you can find tensorflow class
1:29
layers and installing all the layers we
1:32
need for image captioning model
1:34
including Gru
1:37
add layer attention layer or dense layer
1:40
embedding layer no layer normalization
1:43
layer
1:45
so let's run one by one
1:49
and in the next cell
1:54
we Define some hyper parameters
1:56
including vocabulary size which means
1:59
how many vocabularies we're gonna use
2:01
for image captioning
2:03
or you can find feature extractor which
2:06
means what kind of model we want to use
2:08
in encoder model so in this case as we
2:12
discussed in the previous video we have
2:14
specified Inception resnet V2 which is
2:17
very classical CNN based model
2:20
and all the definitions below image
2:23
height Wheels Channel and the feature
2:26
shape is coming from the definition of
2:28
the Inception resnet V2 and especially
2:30
this feature shape 88 1536
2:36
is the shape this Inception resnet V2
2:39
produce
2:41
so let's define
2:43
in this way
2:48
cool
2:49
so in the next cell
2:51
we're going to load the data
2:53
from tfts which means tensorflow data
2:56
sets
2:58
so tensorflow data sets Host this
3:00
caption data set in this name Coco
3:02
captions so we can specify this name and
3:05
the load the data
3:07
and after loading data we can pass some
3:11
pre-processing function
3:13
get Image level which is defined here
3:16
get Image level and here you can find
3:19
some pre-processing very basic
3:21
pre-processing
3:23
including changing the size of the image
3:27
or the change in the scale of the image
3:30
and returning image tensor and the
3:33
caption at the same time
3:37
so let's run in the same way
3:40
and let's take a look at some of the
3:42
example
3:51
here we can see for example for random
3:55
example
3:56
and each pair of imagined texts
4:00
makes sense to me so white played with a
4:03
toasted sandwich
4:05
chips and fries for this image and
4:08
another caption for another image
4:11
and we have a lot of image so if you
4:13
want to see in the other example you can
4:16
run this cell again and you will see
4:18
another example
4:21
so let's move on
4:24
so since we have Text data we need to
4:27
pre-process the data Text data in kind
4:30
of standard way
4:31
so
4:32
in this cell we add start and end
4:36
special tokens
4:39
which we discussed in the slide as well
4:42
so by adding this so we can handle this
4:45
token as a kind of special sign
4:47
the start token means a special token
4:50
that means the the beginning of the
4:53
sentence
4:54
and in the same way the end token means
4:57
the the end of the sentence
5:01
so we're gonna add these things
5:03
in the same way
5:05
trainds.map and in the past this
5:07
function name
5:10
let's move on
5:14
and
5:15
this is a very important pre-processing
5:18
so now we have text Data caption data so
5:22
we're going to create tokenizer
5:25
so by creating tokenizer we can tokenize
5:28
word like start token or cat or dog to
5:33
some index
5:35
in tensorflow it is very easy you can
5:38
just use this text vectoralization
5:41
module
5:42
and you can call
5:45
bypassing all the data all the caption
5:48
data to this text text vectoralization
5:51
layer
5:52
so it takes some time around five
5:55
minutes in my environment so let's wait
5:57
until it finishes
6:01
now it finished
6:03
now
6:05
let's try this tokenizer by passing some
6:10
sample sentence
6:12
start talking this is the sentence and
6:15
token
6:16
so now you can see it is tokenized in
6:19
this way
6:21
and also here you can find a lot of
6:24
paddings
6:25
by changing this Max caption links you
6:29
can control the length of this padding
6:31
but in this case we are specifying 64 so
6:35
the all the order captions
6:37
will be
6:39
the padded in this way until this Max
6:42
caption links
6:47
and in the same way you can see
6:50
the behavior of this tokenizer
6:54
this is very useful once you create you
6:57
can apply the tokenizer in different
6:59
captions and convert text SATA to the
7:04
talk the word tokens
7:07
and it's nice to create converters in at
7:11
this point
7:12
so here we can find string lookup layer
7:15
string lookup layer and creating
7:17
converter from water to index and also
7:21
index to wad so we're going to use these
7:23
modules later
7:25
so this is quite useful
7:28
and then we can create a final data set
7:32
so this is very important part
7:34
so we have Trend DS we're going to add
7:38
additional create DS function this
7:41
function
7:42
and as you can see it returns image
7:45
tensor caption so this is the Tuple
7:50
image tensor will go to encoder and
7:52
caption will go to the decoder
7:56
and also we are creating Target which is
7:59
label
8:02
and in this function you can find this
8:04
target is Created from caption by the is
8:09
shifting
8:10
this caption
8:12
in the in one word
8:16
okay by doing so we're gonna create
8:18
we're going to create a shifted caption
8:20
which means the next word okay and we're
8:24
going to utilize this for Target
8:31
so let's define and apply this function
8:35
and create a batch and specify the batch
8:38
size
8:40
and everything is ready
8:43
so let's take a look at the sum of the
8:45
data set
8:49
here we go so you can find the image in
8:52
this shape and captioned in the shape
8:55
and level in the same shape as caption
8:58
because we are just shifting and also we
9:01
are a padding the shifted part with zero
9:04
value
9:06
looks nice
9:08
so the next part is modal
9:12
most of the model code is already
9:14
explained in the previous video so I'm
9:16
going to go through very quickly but if
9:18
you are not very familiar with that very
9:21
confident with that then you can go back
9:23
to the the previous slide and check the
9:25
what is going on inside encoder and
9:27
decoder so here in this video so let's
9:31
quickly run these things so this is
9:34
encoder and as you can see we adjusted
9:37
the upline Inception resonant V2 to
9:41
image data
9:43
and please note that in this case we are
9:45
freezing the most of the parts of this
9:48
CNN
9:49
because we don't need to need train
9:52
this model
9:53
basically this kind of the the backbone
9:56
is pre-trained by using huge data set in
9:58
this case image net data set so of
10:01
course if you want to the train fine
10:04
tune again it is possible but in this
10:07
case we want to just preserve the way
10:10
it's pre-trained
10:14
so next
10:16
let's move on to the decoder
10:18
it is a bit complex as we discussed and
10:21
here you can find a lot of instruction
10:22
about the attention layer
10:25
and also the steps of the decoder which
10:28
we discussed in the previous video
10:32
and here you can find the definitions so
10:35
here you can find embedding layer to
10:37
create body embedding and first Gru
10:40
layer and attention layer add layer
10:44
layer normalization and final dense
10:47
layer
10:48
so let's define in this way
10:52
so model looks like this
10:55
embedding layer Gru attention add layer
10:59
normalization and dense
11:01
and it has so many parameters
11:05
yeah
11:08
after defining decoder and also encoder
11:11
we can create a final model tfgrass
11:14
model and Define inputs and output
11:18
and as you can see
11:20
it has two inputs
11:23
image input code to encoder and what
11:26
inputs go to
11:28
the goes to the decoder
11:32
and output should be
11:34
decode output
11:37
now model is ready but before running
11:40
training
11:42
we need to Define loss function as usual
11:45
so in terms of the loss
11:48
our model is basically a classification
11:50
model since the decoder generates a lot
11:53
of probabilities for each class each
11:55
world class each vocabularys so we can
11:59
use sparse categorical course entropy as
12:02
usual for the classification problem
12:05
but in this case our data is padded so
12:09
it has a lot of the desire values and a
12:12
lot of the the meaningless values so we
12:15
want to remove that part
12:18
so in order to do so we are defining
12:20
this custom loss function
12:25
and then everything is ready so let's
12:27
compile the model
12:31
and we can run training
12:34
and in terms of the training it takes 15
12:36
minutes to 20 minutes
12:39
with one GPU one T4 gpus two train one
12:43
network
12:45
so if you want to add additional epochs
12:48
it's okay you can do that and I think
12:51
you can get the slightly better result
12:53
but a epic if one Epoch is enough to
12:57
just to check the how it works so let's
13:01
just keep it as one and run training
13:05
and let's wait 15 to 20 minutes until it
13:09
finish the training now training is done
13:12
so let's use it for captioning
13:16
but before that we need to rebuild the
13:19
decoder for inference in order to
13:21
control the gru State manually as we
13:24
talked in the previous video
13:28
so in this cell by reusing the trained
13:32
layers we are creating a model for
13:34
inference
13:37
so here you can find Trend decoder Gru
13:41
train decoder attention and so on
13:45
and compared to the train training model
13:48
we are adding
13:51
GRS state to its iOS
13:55
for inputs we are adding durious State
13:58
inputs and for output we are adding GRE
14:01
estate as a output
14:04
so by doing so we can control the gru
14:06
state in the inference group
14:10
okay so let's generate text with this
14:15
custom inference Loop function
14:18
we already discussed the what kind of
14:21
the component it should have in the
14:23
previous video but let's review very
14:25
briefly
14:27
so first we initialize Gru States in
14:30
this case just initialize with zero
14:32
vectors simply
14:34
and then it will get image and
14:37
pre-process the image and pass it to
14:40
encoder
14:41
of course the train encoder
14:45
and we can get the feature image
14:47
features
14:49
and before passing it to a decoder so we
14:52
also initialize this this start token
14:56
as the first word
14:59
and then
15:01
we are going to
15:02
repeat this for loop again and again
15:06
and generate text one by one
15:09
so step looks like this calling decoder
15:12
of course
15:13
and the IT returns a lot of predictions
15:16
out of the word probabilities so there
15:20
are so many ways to pick up the actual
15:23
wall the final word final selection from
15:26
the list of a lot of word probabilities
15:29
but in this case we are pulling the word
15:32
kind of stochastically
15:34
to introduce some randomness
15:38
so it is these lines of codes are doing
15:41
doing that and eventually picking up
15:44
some bars and the the bringing back to
15:47
the the bring bring it back to the Ward
15:50
from the word token by using the
15:53
tokenizer
15:55
and appending to the list
15:58
So eventually we should get some
16:00
captions so let's take a look at the
16:02
result so Define this function and let's
16:05
call it
16:10
so here
16:12
you can see a caption samples for this
16:16
image
16:17
so this sample image is
16:19
located in this directly just passing
16:23
the spaceball.jpg and the IT returns
16:26
five captions
16:28
it looks like this
16:30
a baseball player standing next to the
16:33
butt
16:33
a catcher in the field playing baseball
16:36
or something like that
16:39
it is not grammatically perfect but uh
16:43
still you can see it is generating the
16:46
text generating multiple text and
16:48
generating the meaningful text and also
16:52
we can see our model is capturing
16:54
important informations like baseball or
16:58
catcher or
17:01
the a man standing next to another man
17:04
or baseball field or something like that
17:08
so
17:09
still it's not very it's not perfect but
17:12
I it is generating very meaningful text
17:16
it's very surprising isn't it so the
17:19
model is very simple we are just
17:21
stacking encoder and decoder and passing
17:24
the image cap image data to encoder and
17:28
the decoder generator captions one by
17:30
one but in also regressive way
17:33
so just by stacking this so we can
17:36
create this kind of the very small
17:38
generative model
17:40
again currently there are so many
17:42
generative large language models out
17:44
there of course they have more complex
17:47
and larger Network and train a much
17:50
larger data set
17:52
but the architecture may look similar to
17:54
this simple model
17:56
thank you so much for watching this
17:58
video I hope you enjoyed
18:00
if you like this presentation you'll
18:02
find more in our ASL GitHub repository
18:05
with 90 plus machine learning notebooks
18:10
if you find it useful please don't
18:12
forget to start the Repository
生成式AI学习
宝玉的技术分享
11 / 12
本课程教你如何通过使用深度学习创建一个图像字幕模型。你会了解到图像字幕模型的不同组成部分,如编码器和解码器,以及如何训练和评估你的模型。在本课程结束时,你将能够创建你自己的图像标题模型,并使用它们来生成图像的标题。
创建图像标题模型: 概述
本模块教你如何使用深度学习来创建一个图像字幕模型。你会了解到一个图像字幕模型的不同组成部分,如编码器和解码器,以及如何训练和评估你的模型。在本模块结束时,你将能够创建自己的图像标题模型,并使用它们来生成图像的标题。
生成式AI学习12——生成式人工智能工作室介绍
宝玉的技术分享
4.55K subscribers
Subscribed
3
Share
Download
Clip
Save
174 views Jun 27, 2023 生成式AI学习
12. Introduction to Generative AI Studio 生成式人工智能工作室介绍
Transcript
0:00
welcome to the introduction to the
0:02
generative AI Studio course in this
0:05
video you learn what generative AI
0:06
studio is and describe its options for
0:09
use you also demo the generative AI
0:11
Studios language tool yourself what is
0:14
generative AI
0:16
it is a type of artificial intelligence
0:18
that generates content for you
0:20
what kind of content well the generative
0:23
content can be multimodal including text
0:25
images audio and video when given a
0:29
prompt or a request generative AI can
0:31
help you achieve various tasks such as
0:33
document summarization information
0:35
extraction code generation marketing
0:38
campaign creation virtual assistance and
0:41
call center bot and these are just a few
0:44
examples how does AI generate new
0:46
content it learns from a massive amount
0:48
of existing content this includes text
0:51
audio and video the process of learning
0:53
from existing content is called training
0:55
which results in the creation of a
0:57
foundation model in llm or large
1:00
language model which Powers chat Bots
1:02
like Bard is a typical example of a
1:05
foundation model the foundation model
1:07
can then be used to generate content and
1:09
solve General problems such as content
1:11
extraction and document summarization it
1:14
can also be trained further with new
1:16
data sets in your field to solve
1:17
specific problems such as financial
1:19
model generation and Healthcare
1:21
Consulting this results in the creation
1:23
of a new model that is tailored to your
1:25
specific needs
1:27
how can you use the foundation model to
1:29
power your applications and how can you
1:31
further train or tune the foundation
1:33
model to solve a problem in your
1:35
specific field Google Cloud provides
1:37
several easy to use tools that help you
1:39
use generative AI in your projects with
1:42
or without an AI and machine learning
1:44
background
1:45
one such tool is vertex AI
1:47
vertex AI is an end-to-end ml
1:50
development platform on Google Cloud
1:52
that helps you build deploy and manage
1:54
machine learning models with vertex AI
1:57
if you are an app developer or data
1:59
scientist and want to build an
2:01
application you can use generative AI
2:03
Studio to quickly prototype and
2:05
customize generative AI models with no
2:07
code or low code if you are a data
2:09
scientist or ml developer who wants to
2:12
build and automate a generative AI model
2:14
you can start from model Garden
2:16
model Garden lets you discover and
2:19
interact with Google's foundation and
2:21
third-party open source models and has
2:23
built-in ml Ops tools to automate the ml
2:26
pipeline
2:28
in this course you focus on generative
2:30
AI Studio
2:32
generative AI Studio supports language
2:34
vision and speech the list grows as you
2:37
are learning this course
2:38
for language you can design a prompt to
2:40
perform tasks in tune language models
2:43
for vision you can generate an image
2:45
based on a prompt and further edit the
2:47
image for speech you can generate text
2:50
from speech or vice versa
2:53
let's focus on what you can do with
2:54
language in generative AI Studio
2:57
specifically you can design prompts for
3:00
tasks relevant to your business use case
3:02
including code generation
3:04
create conversations by specifying the
3:07
context that instructs how the model
3:08
should respond in tune a model so it is
3:11
better equipped for your use case which
3:13
allows you to then deploy to an endpoint
3:15
to get predictions or tested in prompt
3:17
design let's walk through these three
3:19
features in detail first is prompt
3:22
design to get started experimenting with
3:24
large language models or llms click on
3:26
new prompt
3:31
in the world of generative AI a prompt
3:34
is just a fancy name for the input text
3:36
that you feed to your model you can feed
3:38
your desired input text like questions
3:40
and instructions to the model the model
3:42
will then provide a response based on
3:44
how you structured your prompt therefore
3:46
the answers you get depend on the
3:47
questions you ask the process of
3:49
figuring out and designing the best
3:51
input text to get the desired response
3:53
back from the model is called prompt
3:55
design which often involves a lot of
3:57
experimentation let's start with a free
3:59
form prompt one way to design a prompt
4:02
is to Simply tell the model what you
4:03
want in other words provide an
4:05
instruction for example generate a list
4:08
of items I need for a camping trip to
4:10
Joshua Tree National Park we send this
4:12
text to the model and you can see the
4:14
model outputs a useful list of items we
4:16
don't want to camp without
4:18
this approach of writing a single
4:20
command so that the llm can adopt a
4:22
certain behavior is called zero shot
4:24
prompting generally there are three
4:26
methods that you can use to shape the
4:28
model's response in a way that you
4:29
desire zero shot prompting is a method
4:32
where the llm is given no additional
4:34
data on the specific task that is being
4:36
asked to perform instead it is only
4:38
given a prompt that describes the task
4:40
for example if you want the llm to
4:43
answer a question you just prompt what
4:45
is prompt Design One Shot prompting is a
4:48
method where the llm is given a single
4:50
example of the task that is being asked
4:52
to perform for example if you want the
4:55
llm to write a poem you might provide a
4:57
single example poem and few shot
4:59
prompting is a method where the llm is
5:02
given a small number of examples of the
5:04
task that it is being asked to perform
5:06
for example if you want the llm to write
5:09
a news article you might give it a few
5:11
news articles to read you can use the
5:14
structured mode to design the fuse shot
5:16
prompting by providing a context and
5:18
addition all examples for the model to
5:20
learn from
5:21
the structured prompt contains a few
5:23
different components first we have the
5:25
context which instructs how the model
5:27
should respond you can specify words the
5:29
model can or cannot use topics to focus
5:32
on or avoid or a particular response
5:34
format and the context applies each time
5:36
you send a request to the model let's
5:38
say we want to use an llm to answer
5:40
questions based on some background text
5:42
in this case a passage that describes
5:44
changes in rainforest vegetation in the
5:47
Amazon we can paste in the background
5:48
text as the context
5:50
then we add some examples of questions
5:52
that could be answered from this passage
5:54
like what does lgm stand for or what did
5:58
the analysis from the sentiment deposits
6:00
indicate we'll need to add in the
6:02
corresponding answers to these questions
6:03
to demonstrate how we want the model to
6:06
respond then we can test out the prompt
6:07
we've designed by sending a new question
6:09
as input and there you go you've
6:11
prototyped a q a system based on
6:13
background text in just a few minutes
6:16
please note a few best practices around
6:18
prompt design be concise be specific and
6:21
well-defined ask one task at a time turn
6:25
generative tasks into classification
6:27
tasks for example instead of asking what
6:30
programming language to learn ask if
6:32
python Java or C is a better fit for a
6:35
beginner in programming and improve
6:37
response quality by including examples
6:40
adding instructions in a few examples
6:42
tends to yield good results however
6:44
there's no one best way to write a
6:46
prompt you may need to experiment with
6:48
different structures formats and
6:49
examples to see what works best for your
6:51
use case for more information about
6:54
prompt design please check text prompt
6:56
design in the reading list
6:58
so if you've designed a prompt that you
7:00
think is working pretty well you can
7:01
save it and return to it later your
7:03
saved prompt will be visible in the
7:05
prompt Gallery which is a curated
7:07
collection of sample prompts that show
7:09
how generative AI models can work for a
7:11
variety of use cases finally in addition
7:14
to testing different prompts and prompt
7:16
structures there are a few model
7:18
parameters you can experiment with to
7:20
try and improve the quality of responses
7:22
first there are different models you can
7:24
choose from each model is tuned to
7:26
perform well on specific tasks you can
7:29
also specify the temperature top p and
7:31
top k
7:33
these parameters all adjust the
7:34
randomness of responses by controlling
7:36
how the output tokens are selected when
7:39
you send a prompt to the model it
7:40
produces an array of probabilities over
7:42
the words that could come next and from
7:45
this array we need some strategy to
7:47
decide what to return a simple strategy
7:49
might be to select the most likely word
7:51
at every time step but this method can
7:53
result in uninteresting and sometimes
7:55
repetitive answers on the contrary if
7:58
you randomly sample over the
8:00
distribution returned by the model you
8:02
might get some unlikely responses by
8:04
controlling the degree of Randomness you
8:06
can get more unexpected and some might
8:08
say creative responses back to the model
8:11
parameters temperature is a number used
8:13
to tune the degree of Randomness low
8:15
temperature means to select the words
8:17
that are highly possible and more
8:19
predictable in this case those are
8:21
flowers and the other words that are
8:22
located at the beginning of the list
8:24
this setting is generally better for
8:26
tasks like q a and summarization where
8:29
you expect a more predictable answer
8:30
with less variation High 10 temperature
8:33
means to select the words that have low
8:35
possibility and are more unusual in this
8:38
case those are bugs and the other words
8:40
that are located at the end of the list
8:42
this setting is good if you want to
8:44
generate more creative or unexpected
8:46
content in addition to adjusting the
8:48
temperature top K lets the model
8:50
randomly return a word from the top K
8:52
number of words in terms of possibility
8:55
for example top 2 means you get a random
8:58
word from the top two possible words
9:00
including flowers and trees this
9:03
approach allows the other high scoring
9:04
word a chance of being selected however
9:07
if the probability distribution of the
9:09
words is highly skewed and you have one
9:11
word that is very likely and everything
9:13
else is very unlikely this approach can
9:16
result in some strange responses
9:19
the difficulty of selecting the best top
9:21
K value leads to another popular
9:23
approach that dynamically sets the size
9:25
of the short list of words
9:27
top P allows the model to randomly
9:29
return a word from the top P probability
9:31
of words
9:33
with top P you choose from a set of
9:35
words with the sum of the likelihoods
9:37
not exceeding P for example P of 0.75
9:41
means you sample from a set of words
9:43
that have a cumulative probability
9:45
greater than 0.75 in this case it
9:49
includes three words flowers trees and
9:51
herbs
9:53
this way the size of the set of words
9:55
can dynamically increase and decrease
9:57
according to the probability
9:58
distribution of the next word in the
10:00
list
10:01
In Sum generative AI Studio provides a
10:04
few model parameters for you to play
10:06
with such as the model temperature top K
10:08
and top P note that you are not required
10:11
to adjust them constantly especially top
10:13
K and top p
10:16
now let's look at the second feature
10:17
which creates conversations
10:19
first you need to specify the
10:21
conversation context context instructs
10:24
how the model should respond
10:26
for example specifying words the model
10:28
can or cannot use topics to focus on or
10:31
avoid or response format
10:33
context applies each time you send a
10:35
request to the model
10:37
for a simple example you can define a
10:39
scenario and tell the AI how to respond
10:41
to help desk queries your name is Roy
10:44
you are a support technician of an I.T
10:46
Department you only respond with have
10:49
you tried turning it off and on again to
10:51
any queries you can tune the parameters
10:53
on the right the same as you do when
10:55
designing the prompt
10:56
to see how it works you can type my
10:59
computer is slow in the chat box and
11:01
press enter the AI responds have you
11:04
tried turning it off and on again
11:06
exactly as you told the AI to do
11:09
the cool thing is that Google provides
11:11
the apis and sdks to help you build your
11:14
own application you can simply click
11:16
view code first you need to download the
11:19
vertex AI sdks that fit your programming
11:21
language like Python and curl SDK stands
11:25
for software design kits they implement
11:27
the functions and do the job for you you
11:30
use them like you call libraries from
11:31
the code you then follow the sample code
11:34
and the API and insert the code into
11:36
your application
11:38
now let's look at the third feature tune
11:40
a language model if you've been
11:42
prototyping with large language models
11:44
you might be wondering if there's a way
11:45
you can improve the quality of responses
11:47
Beyond just prompt design so let's learn
11:49
how to tune a large language model and
11:51
how to launch a tuning job from
11:53
generative AI Studio as a quick recap
11:56
The Prompt is your text input that you
11:58
pass to the model your prompt might look
12:00
like an instruction and maybe you add
12:01
some examples then you send this text to
12:04
the model so that it adopts the behavior
12:06
that you want
12:09
prompt design allows for fast
12:10
experimentation and customization and
12:13
because you're not writing any
12:14
complicated code you don't need to be an
12:16
ml expert to get started
12:18
but producing prompts can be tricky
12:20
small changes in wording or word order
12:22
can affect the model results in ways
12:24
that aren't totally predictable and you
12:26
can't really fit all that many examples
12:28
into a prompt even when you do discover
12:30
a good prompt for your use case you
12:32
might notice the quality of model
12:34
responses isn't totally consistent one
12:36
thing we can do to alleviate these
12:38
issues is to tune the model so what's
12:40
tuning well one version you might be
12:43
familiar with is fine tuning in this
12:45
scenario we take a model that was
12:47
pre-trained on a generic data set we
12:49
make a copy of this model then using
12:51
those learned weights as a starting
12:53
point we retrain the model on a new
12:55
domain-specific data set this technique
12:58
has been pretty effective for lots of
13:00
different use cases but when we try to
13:02
fine-tune llms we run into some
13:04
challenges llms are well as the name
13:07
suggests large so updating every weight
13:10
can take a long training job compound
13:12
all of that computation with the hassle
13:14
and cost of now having to serve this
13:16
giant model and as a result fine tuning
13:18
a large language model might not be the
13:20
best option for you but there is an
13:22
Innovative approach to tuning called
13:24
parameter efficient tuning this is a
13:27
super exciting research area that aims
13:28
to reduce the challenges of fine-tuning
13:30
llms by only training a subset of
13:33
parameters these parameters might be a
13:35
subset of the existing model parameters
13:37
or they could be an entirely new set of
13:39
parameters for example maybe you add on
13:43
some additional layers to the model or
13:45
an extra embedding to The Prompt if you
13:47
want to learn more about parameter
13:48
efficient tuning in some of the
13:50
different methods a summary paper is
13:51
included in the reading list of this
13:53
course but if you just want to get to
13:55
building then let's move to generative
13:56
AI studio and see how to start a tuning
13:59
job from the language section of
14:01
generative AI Studio select tuning to
14:03
create a tune model we provide a name
14:05
then point to the local or cloud storage
14:08
location of your training data parameter
14:10
efficient tuning is ideally suited for
14:12
scenarios where you have modest amounts
14:14
of training data say hundreds or maybe
14:16
thousands of training examples
14:19
your training data should be structured
14:20
as a supervised training data set in a
14:23
text to text format each record or Row
14:25
in the data will contain the input text
14:27
in other words The Prompt which is
14:29
followed by the expected output of the
14:31
model this means that the model can be
14:34
tuned for a task that can be modeled as
14:36
a text to text problem
14:38
after specifying the path to your data
14:40
set you can start the tuning job and
14:42
monitor the status in the Google Cloud
14:44
console
14:45
when the tuning job completes you'll see
14:47
the tuned model in the vertex AI model
14:49
registry and you can deploy it to an
14:51
endpoint for serving or you can test it
14:53
in the generative AI Studio
14:56
in this course you learned what
14:57
generative AI is and the tools provided
14:59
by Google Cloud to empower your project
15:01
with generative AI capabilities
15:04
specifically you focused on generative
15:06
AI Studio where you can use gen AI in
15:09
your application by quickly prototyping
15:11
and customizing generative AI models you
15:15
learn that generative AI Studio supports
15:17
three options language vision and speech
15:19
you then walked through the three major
15:21
features in language design and test
15:23
prompt create conversations and tune
15:26
models
15:27
this was a short lesson introducing
15:29
generative AI Studio on vertex AI for
15:32
more information about natural language
15:34
processing and different types of
15:36
language models like decoder encoder
15:38
Transformer and llm please check the
15:41
course titled natural language
15:42
processing on Google Cloud listed in the
15:45
reading list
15:46
now it's time to play with generative AI
15:48
studio in a Hands-On lab where you
15:50
design and test prompts in both free
15:52
form and structured modes create
15:54
conversations and explore the prompt
15:57
Gallery by the end of this lab you will
15:59
be able to use the capabilities of
16:01
generative AI Studio that we've
16:03
discussed in this course have fun
16:05
exploring
本课程介绍了顶点人工智能的产品--生成式人工智能工作室,它可以帮助你对生成式人工智能模型进行原型设计和定制,这样你就可以在你的应用程序中使用其功能。在本课程中,您将学习什么是生成性人工智能工作室,它的功能和选项,以及如何通过产品的演示来使用它。
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment