Lecture 8 - (Google) Generative AI for Media

These are my notes from lecture.

Generative AI - a type of AI that can generate novel and useful content. Exponential growth of computing power, allows us to reach new levels in ML.

Progress of image generation: 2015 - GAN 2015 - DCGAN 2018 - StyleGAN 2021 - DALL-E 2022 - Imagen

Imagen - diffusion process, we go from noise to image and the other way around.

3 models -> one low res 64x64 generates first iteration, then it is used instead of noise to scale to 256x256 and then we use it to further generate to 1024x1024.

Robotics - adding LLM to domain of robotics. There is Vision-Language-Action (VLA) model.

Cool visualisation of attention in transformers in “Music Transformer” research by Anna Huang.

Link to video: https://www.youtube.com/watch?v=P7Hkh2zOGQ0