In short, in the recognition task, the software toolkit must always be chosen for the specific type of object. If the conversation begins with the words, they say, and not to find, and not recognize us anything, that only the frame will fall, well, it nafig such a task. Not yet invented a universal object classifier. Even if some software allows you to search for arbitrary objects, then in fact it is a whole complex of various solutions and algorithms that are put together.
Further consideration should be given, and what type of object is proposed for recognition. What is its geometric shape / shape. From what angles will be filming. Whether the differences are strong lighting. The object is in motion or static. Is there a set (usually a “set” is equivalent to hundreds of thousands or more samples) of images for machine learning? And many other conditions and nuances. Even such a seemingly trivial object for a human as a dog generates many variations. For example: in the frame there will be only a muzzle (full face, profile, generally upside down) or the whole carcass, color, and what about wool (some breeds are more polished, some less), and the like.
All these differences are absolutely fundamental for almost any recognition algorithm that you want to apply. Some of them will be insensitive to color and shades, but will be sensitive to the angle of the shooting. Some will be insensitive to turning and tilting, but absolutely sensitive to the shape of the object.
Contour analysis is an excellent choice, but, as has already been correctly noted in the question, it is too sinful by the necessity of having so-called ideal conditions. In the case of a dog, the ideal conditions would be to evenly throw it all over with some paint, for example, red and put it in a room with a white floor and walls. Instead of flashing lights, install spotlights in the corners, giving a steady and unblinking light. After this, it is necessary to force the animal, exhausted by intolerable conditions, to sit quietly and not to move, so that the geometric shape of the contours that will be detected in the frame varies slightly. Of course, the software obtained during such an experiment will be able to recognize only red dogs sitting in a white room with spotlights. And in general, the profit is doubtful, since it is only necessary to slightly alleviate the conditions, making them less ideal for contour analysis, as such an algorithm will immediately stop working.
Contours are a two-dimensional form. This is, in fact, a small amount of information to processing, which on the one hand leads to savings in computing resources and an increase in the speed of the algorithm, but on the other it is the absence of certain signs, without which in some cases recognition becomes simply impossible.

Try to guess where in the pictures is figure 1, and where is 7. Or maybe there are two units or two sevens ... depending on the angle from which to look and which font to use ...
Obviously, contour analysis is not suitable for recognizing such objects as a dog (yes, any that do not have a constant form) and for those cases where objects can be deformed, which is the norm for the natural environment.
In general, it turns out an interesting thing: as if the inhabitants of the two-dimensional world tried to look at three-dimensional objects. The cube for them would appear as a square, and the ball - around. So the camera does not see at all what it is capable of simulating the human brain, adding the necessary details that the eyes did not see, on their own. This must be remembered.