Let's say you need to determine whether a person is smiling or not by a photo. We collect a bunch of pictures with smiling people and no. We determine for training that at the exit we will receive 1 if the person is smiling, 0 is not. Photos will be fed as an array. Well now the most important thing is the question. The size of each photo is different, that is, you need to adjust them to the same resolution. And as the logic suggests, it’s not easy to change the resolution, but to “combine” the faces (roughly speaking, in all the photos of the faces in the center). But the faces are different, it is impossible to combine on any part of the face (the distance between the eyes, the width of the face). There is still a bunch of different colors for each photo, which also need to be normalized to a certain level.

How is the data preparation in this case? I correctly understand that you can not prepare the data, but in this case, for training you have to use a lot more photos?

  • You can use libraries for face recognition (for example, openCV) for preprocessing. And cut a photo with a face. At the same time, the obtained pictures of faces should be compressed, for simplification. 256 * 256 is enough for the eyes. Indeed, the data may not be prepared, but the network will be more complicated, slower and, most likely, less accurate - Nikita Vasilchenko

0