Convolutional Neural Networks

Third lecture is about CNNs (https://youtu.be/2xqkSUhmmXU?si=o2kaqvN7kKOT5Kzc). It is used mostly for images. We use it mostly for classification, but sometimes also regression. The most popular is YOLO algorithm/network. So extracting different features from an image.

Spatial structure

In CNNs we use something called Spatial Structure. We have a patch that is sliding over image and giving information to neurons. It’s to reduce size as connecting every pixel with neurons is not optimal. We use a matrix e.g. 3x3, and we shift this matrix over e.g. 2 pixels.

With that technique we can obtain powerful tool for detecting particular features. They are called filters. They are special matrices, that contain information about specific thing. In the example, there was an X letter and detecting the middle, and the arms of this letter.

There is also something called pooling. Most common is max pooling. It allows reducing size, as we reduce e.g. 2x2 image to max value. In this way we can quickly reduce size of any image etc.

Autonomous navigation

CNNs are used for autonomous navigation. Car can drive using CNNs. It takes the images of the road, image of map and on top of it image of the route. It computes the probabilities of taking a turn.

Image of road has feature extraction, e.g. if we are on the crossing, we get 3 boxes for the left, forward and right turn. Based on those images CNNs can calculate the path for driving. Adding the map on top of that ensures, we are going into proper road instead of e.g. driveway. Finally, adding route map allows picking definitive path for CNN.