Lecture 8 - (Google) Generative AI for Media
September 05, 2024These are my notes from lecture.
Generative AI - a type of AI that can generate novel and useful content. Exponential growth of computing power, allows us to reach new levels in ML.
Progress of image generation: 2015 - GAN 2015 - DCGAN 2018 - StyleGAN 2021 - DALL-E 2022 - Imagen
Imagen - diffusion process, we go from noise to image and the other way around.
3 models -> one low res 64x64 generates first iteration, then it is used instead of noise to scale to 256x256 and then we use it to further generate to 1024x1024.
Robotics - adding LLM to domain of robotics. There is Vision-Language-Action (VLA) model.
Cool visualisation of attention in transformers in “Music Transformer” research by Anna Huang.
Link to video: https://www.youtube.com/watch?v=P7Hkh2zOGQ0