Lecture 6. Language models and new frontiers.

Limitations of ML models

There are problems with generalization. Modern deep networks are able to almost perfectly fit to the dataset. The problem is when they get some data out of blue, they fail. It’s really dangerous in safety-critical applications, like self-driving. Especially as there is also some uncertainty/probability. As neural networks are excellent function approximators, they sometimes over represent data. It’s visible in models, where dataset had a lot of focus on certain thing.

Diffusion models

VAEs/GANs models were focused on one shot generation. With diffusion models we generate iteratively over time.

To train diffusion model, we go from data to noise. When we run the model, we go from noise to data.

During training, we use forward noising. Every step in time we add more noise to original image until we get 100% of noise.

Molecular design

Diffusion models are used now in chemistry and biology. We can generate 3D models of molecules or novel proteins. We encode sequences of existing molecules and through diffusion we can generate new molecules using only noise data.

LLMs

Large Language Models are very, very big neural networks, trained on very, very big amount of data. There are multiple challenges in LLMs:

  • robustness (e.g. correct grammar),
  • confidence (they often hallucinate, they say wrong things with high confidence),
  • long-term planning (they have problems to follow certain patterns),
  • logic and discovery (they make mistakes and unlogical text, e.g. cannot calculate numbers).

Lecture: https://youtu.be/N1fbskTpwZ0?si=kuydF29-PoUMyxWI