Hello !

I had a task that I had never met before.

Задача: identify an object from a photo (the quality of a picture can be both good and bad).

Цель: to carry out all sorts of tasks with a specific object (for example, I want to find a dog in a photo and determine its color and breed).

Платформа для реализации: iOS

After a couple of hours searching for information on the Internet, I came to this article .

The author spoke in every way about OpenCV and about the sequence groups:

  1. pre-filtering and image preparation
  2. logical processing of filtering results
  3. logical processing decision making algorithms

I got acquainted with each stage and made some conclusions for myself:

1 point for pre-filtering, I thought it was better to use - фильтрацию контуров (contours are very useful when we want to move from working with an image to working with objects in this image)

2 point how to logically process the results I don’t have a clue, but seemingly a контурный анализ solves my problem, but they need ideal conditions too (which I probably don’t have, and if not, then not always)

3 point - remains a mystery to me.

Question: in the right direction, am I going? Whether I even read. Maybe I have very stupid questions, but where else, no matter how here I can find answers to them.

PS found another such statement

OpenCV is not for smartphones, because under ARMs you torment.

Taken here

Thank you all for your attention!

  • one
    Hm This is a task for a large research project. There are three or four implementations in the world, do you want someone to write a prototype of the fifth? - VladD 4:34 pm
  • @VladD no, I do not ask for the fifth prototype, I asked in the right direction, go and whether I read the literature, and whether it is worth undertaking this task alone or this task more for a group of programmers / mathematicians. It seems well formulated the same question, why do you tell me - I can not understand. - kxko
  • one
    @kxxko, you still ask the wrong questions. Whether, there ... all of these are very general questions, and the general answers to which will not be useful either to you personally or to anyone else. Ask the specifics in the practice, then you will probably help here. And of course there is no way to disagree with Vlad, that the chances of implementing your plans are minimal. Nevertheless, I personally think that this is sometimes not the main thing - practical experience is always welcome. Even if negative. - alexis031182
  • one
    @kxxko: The first part - contour recognition is quite a simple step. But to go from the contours to a specific concept (that is, to understand that this curve is the contour of the dog, not the kitty) is a task for artificial intelligence. This does not mean that it is fundamentally unsolvable, it means that it is complex. - VladD
  • one
    @kxxko: It seems to me worth it. The task is complex and non-trivial, and if you find a solution, at least partial, it will interest many. (Perhaps one answer with updates is better.) - VladD

1 answer 1

In short, in the recognition task, the software toolkit must always be chosen for the specific type of object. If the conversation begins with the words, they say, and not to find, and not recognize us anything, that only the frame will fall, well, it nafig such a task. Not yet invented a universal object classifier. Even if some software allows you to search for arbitrary objects, then in fact it is a whole complex of various solutions and algorithms that are put together.

Further consideration should be given, and what type of object is proposed for recognition. What is its geometric shape / shape. From what angles will be filming. Whether the differences are strong lighting. The object is in motion or static. Is there a set (usually a “set” is equivalent to hundreds of thousands or more samples) of images for machine learning? And many other conditions and nuances. Even such a seemingly trivial object for a human as a dog generates many variations. For example: in the frame there will be only a muzzle (full face, profile, generally upside down) or the whole carcass, color, and what about wool (some breeds are more polished, some less), and the like.

All these differences are absolutely fundamental for almost any recognition algorithm that you want to apply. Some of them will be insensitive to color and shades, but will be sensitive to the angle of the shooting. Some will be insensitive to turning and tilting, but absolutely sensitive to the shape of the object.

Contour analysis is an excellent choice, but, as has already been correctly noted in the question, it is too sinful by the necessity of having so-called ideal conditions. In the case of a dog, the ideal conditions would be to evenly throw it all over with some paint, for example, red and put it in a room with a white floor and walls. Instead of flashing lights, install spotlights in the corners, giving a steady and unblinking light. After this, it is necessary to force the animal, exhausted by intolerable conditions, to sit quietly and not to move, so that the geometric shape of the contours that will be detected in the frame varies slightly. Of course, the software obtained during such an experiment will be able to recognize only red dogs sitting in a white room with spotlights. And in general, the profit is doubtful, since it is only necessary to slightly alleviate the conditions, making them less ideal for contour analysis, as such an algorithm will immediately stop working.

Contours are a two-dimensional form. This is, in fact, a small amount of information to processing, which on the one hand leads to savings in computing resources and an increase in the speed of the algorithm, but on the other it is the absence of certain signs, without which in some cases recognition becomes simply impossible.

first digit second digit

Try to guess where in the pictures is figure 1, and where is 7. Or maybe there are two units or two sevens ... depending on the angle from which to look and which font to use ...

Obviously, contour analysis is not suitable for recognizing such objects as a dog (yes, any that do not have a constant form) and for those cases where objects can be deformed, which is the norm for the natural environment.

In general, it turns out an interesting thing: as if the inhabitants of the two-dimensional world tried to look at three-dimensional objects. The cube for them would appear as a square, and the ball - around. So the camera does not see at all what it is capable of simulating the human brain, adding the necessary details that the eyes did not see, on their own. This must be remembered.

  • 2
    Before I tick off the answer, I read your answer a couple of times to get a better look. Thanks for your time, it may seem to someone that the answer is not complete or something else, but he gave me a great idea! Thank you so much for your efforts. The picture as a whole is clear, the problems are clear. I will try =) I am sure that it will not work, but if it works, it will be the best achievement at this stage in my life. Thank ! - kxko