On the big image (screen shot) there are a lot of different small images, each of which at the time of the screen shot can be in one of the states (frames). It is necessary to find the X, Y coordinates of each of the small images on the large image. Obviously, this can be done according to the principle of brute force, i.e. going through each pixel of a large image and comparing with each of the possible states (Number of small images * number of frames) - but this is a very long and bad option - so I hope someone will tell a more beautiful and fast algorithm for solving this problem.

Important reservations:

  1. When solving this problem, it is assumed that each frame of each small image is known (available as a separate image file), and when it appears on a large image, it does not rotate or scale.

  2. there is no need to establish what kind of image this is - only the coordinates on the large image are important.

  3. The program is written in C #, but in principle I am looking for not ready-made code, but a description of the action algorithm, so the language is not important in principle, ideally, it is generally a sequence of actions not tied to any language.

    1 answer 1

    Yes, classically the problem is solved through the enumeration of all images and its complexity is appropriate. But it is possible to significantly simplify the task and there are a couple of methods for this (and they can be used in combination).

    • The image can be converted to monochrome. One byte per pixel. In this mode, the comparison will go much faster. And when a match is found, you can always "double-check on a full-color copy."
    • There is no need for each pixel to compare all images in turn. You can go through all the pictures, subtract the first 8-16 pixels and remember. Then build a tree with them. Now enough to move the "window" in the original image and compare. If the first bytes match, it makes sense to check the entire picture. If you correctly make a buffer for pixels (byte-wide), then the run across the entire image can be almost O (n), if the specified pictures are not found. And if you read the KMP algorithm (Knut-Maurice-Prayt to search for a substring in a string) and make suffixes for searching correctly, you can do it even faster.

    I once solved a very similar problem, it was necessary to compare two pictures and determine whether there was a shift of a part of the image (by the way, this is a classic task that MPEG2 or MPEG4 encoder solves) and transferring the image to one-byte representation greatly accelerated the process. But I went even further - I did the packing of identical bytes (this is a RLE).