The structure of the question is the following: first, I give the concept of collective recognition , further explain the different methods of collective classification, which I found, and at the end I bring my question. Who has already eaten a dog in this business and they may not need to explain what it is and what methods are there, you can just look at the headings of the methods I have cited and go to the question.


What is collective recognition / classification?

Collective (group) recognition means the use of multiple classifiers, each of which decides on the class of one entity, situation, image, and then merging and coordinating the decisions of individual classifiers with the help of an algorithm. The use of multiple classifiers, as a rule, leads to higher recognition accuracy and better indicators of computational efficiency.

Some approaches to combining classifier solutions:

  1. based on the concept of competence areas of classifiers and the use of procedures to evaluate the competence of classifiers with respect to each entry of the classification system

  2. methods of combining solutions based on the use of neural networks .


Competence area method

The idea of ​​collective classification based on areas of competence is that each basic classifier can work well in a certain area of ​​the feature space (area of ​​competence), exceeding in this area the remaining classifiers in accuracy and reliability of decisions. The area of ​​competence of each basic classifier should be assessed somehow. The corresponding program is called the referee . The task of classification is solved in such a way that each algorithm is used only in the area of ​​its competence, i.e. where it gives the best results in comparison with other classifiers. In this case, the decision of only one classifier is taken into account in each region. However, it is necessary to have some kind of algorithm, which for any input determines which of the classifiers is the most competent.

One approach assumes that a special algorithm (referee) is used with each classifier, which is designed to assess the competence of the classifier. Under the competence of the classifier in this area of ​​the space of representation of objects of classification is understood its accuracy, i.e probability of correct classification of objects whose description belongs to this area.

The general scheme of learning collective recognition based on the assessment of competence consists of 2 steps (Fig. 1). At the 1st step, the training and testing of each specific basic classifier is performed. This step is no different from conventional learning patterns. In the next step, after testing each classifier, the training sample that was used at the testing stage for a classifier is divided into two subsets, L + and L− . At the same time, the first subset includes those instances of the original test sample, which were correctly classified during testing. The second subset includes the remaining test sample instances, i.e. those that were classified erroneously. Considering these data sets as areas of competence and incompetence of the classifier, respectively, they can be used as training data for training the “referee” algorithm. When classifying new data, the referee's task is to determine for each input example whether it belongs to the competence area of ​​the algorithm or not, and if it belongs, what is the probability of correct classification of this example. After this, the referee instructs the most competent classifier to solve the classification problem. enter image description here


Neural network approaches.

Neural network approaches of collective classification are divided into methods that use the unification of classifiers using a neural network, ensembles of networks ( ensembles of neural networks ) and those that use neural networks built from modules.

Neural network for combining classifiers

One of the approaches considers the use of a neural network for combining the solutions of basic classifiers (Fig. 2). enter image description here

The output of each basic classifier is a decision vector (a vector containing soft labels as values), the values ​​of the elements of which belong to a certain numerical interval [a, b] . These values ​​are fed to the input of the neural network (it must be trained to integrate the decisions of the basic level classifiers), the output of which is a decision in favor of one or another class. The output of the network can also be a vector, the dimension of which is equal to the number of classes of recognizable objects, which at each position has a value of some measure of confidence in favor of one or another class. In this case, the class with the maximum value of such a measure can be chosen as a solution.

The system of combining solutions functions as follows:

  1. many basic classifiers are selected and trained;
  2. Meta-data are prepared for neural network training. For this, the basic classifiers are tested using an interpreted data sample and, for each test example, the decision vector of the basic classifiers is formed, to which the component is added, to which the name of the true class of the tested example is entered
  3. A sample of meta-data is used to train a neural network that performs the integration of solutions.

Modular Neural Network Method

For modular neural networks, it is proposed to use the so-called gateway network (“gating network”), a neural network to assess the competence of classifiers for a particular input vector of data presented to classifiers. This option considers the neural network paradigm for combining decisions based on competency assessments. The corresponding theory here is called a mixture of experts . Each classifier is assigned a “referee” program, which predicts the degree of its competence in relation to a particular input supplied to the input of many basic level classifiers (Fig. 3). enter image description here

Depending on the input vector X, the solutions of various classifiers can be selected and used to make a joint decision. The number of inputs of the predictive network is equal to the dimension of the input feature space vector. The number of network outputs equals the number of classifiers, i.e. L. A predictive neural network is trained to predict the measure of competence of each classifier upon presentation of a specific input vector to it, i.e. assessment of the fact that the classifier produces the correct solution. The degree of competence is estimated by the number of the interval [0,1] .


Neural Network Ensembles

Also, the proposed architecture of the system of integration solutions, which consists of several experts (neural networks). Combining the knowledge of neural networks in an ensemble has proven its effectiveness by demonstrating the promise of using collective recognition technologies in overcoming the problem of " fragility ."

An ensemble of neural networks is a set of neural network models that makes a decision by averaging the results of the work of individual models. Depending on how the ensemble is constructed, its use allows to solve one of two problems: the tendency of the basic neural network architecture to under-train (this problem is solved by the meta-algorithm of boosting ), or the tendency of the basic architecture to be re-trained (the meta-algorithm bagging ).

There are various universal voting schemes for which the class is the winner:

  1. maximum - with the maximum response of members of the ensemble;
  2. averaging - with the highest average response of the members of the ensemble;
  3. majority - with the largest number of votes of members of the ensemble.

Some other methods

There are also such ensemble machine learning algorithms as:

  • Random forest (consisting in the use of a committee (ensemble) of decisive trees)
  • Adaboost (algorithm for strengthening classifiers, by combining them into a committee proposed by Yoav Freund)

My question

The question is which collective recognition scheme is better to use for character / digit recognition / autonumbers. The data sources from which I took information about various group classification schemes are dated 2006 and I am afraid that some methods might become outdated. Which scheme will be more rational to use in terms of the relevance of any method.

I have mentioned the following schemes:

  • on the basis of areas of competence
  • based on the use of a single neural network to combine classifiers
  • based on the use of modular neural network
  • based on neural network ensembles
  • committee methods Random forest and Adaboost

Which of the approaches in the potential can give the best indicators of accuracy and performance in the field of character recognition / numbers / autonumbers. Perhaps some of the methods are outdated or have shown their inconsistency in certain areas. Perhaps there are other more effective and relevant methods of collective recognition (group classification).


Sources with a detailed description (from there I took information about the methods of collective recognition):

Methods and algorithms for collective recognition: Overview (V.I. Gorodetsky, 2006, pdf)

Current issues of using convolutional neural networks and their committees in recognizing digit patterns (Kuzmitsky NN, 2012, pdf)

Random forest (Wiki)

AdaBoost (Wiki)

  • 2
    @ ErbaAitbayev'KZ '. If the answer to this question has already been given on another site, then it is worth translating the answer. Translations of questions / answers indicating the source are welcome - Grundy
  • one
    I'm not an expert, but I think you can look at what methods were used here: yann.lecun.com/exdb/mnist - Ilya Bizunov
  • I deeply apologize for interfering in your highly scientific discussion with myself, but I still want to note that, firstly, such “questions”, it seems to me, that pretend to the topic of a Ph.D. thesis are not entirely relevant to SO, but -second, if you are interested in "not checkers, but go," I recommend to look at this article here: codeproject.com/articles/238114/realtime-webcam-sudoku-solver . The author, without involving "redundant entities" and "assemblies of neural networks", perfectly solves the problem you set in a simple C ++ code :) - SeNS
  • I will also note, AFAIK, that algorithms similar to those used by Bojan Banco in Sudoku Solver (but, of course, much more complex, and with a much larger number of patterns) are still used in all modern OCR, including FineReader (one of the most brilliant programs in this market). And the most important difference of the above-mentioned article is that there is a 100% working and workable solution available for anyone to check, unlike the "high-wise" articles on "modular neural networks" and other bs that are repeatedly encountered in Ph.D. theses (but really there close to 0). - SeNS
  • @Chichi cons - an anonymous thing, nothing can be done about it. It happens that work is bad. But non-anonymous would work much worse. - Nick Volynkin

1 answer 1

This answer is a mix of the translation of the answer of the user DW ♦ in this question in English and the answers of other StackExchange experts in my question to Cross Validated with similar wording.


DW’s answer in this question :

Modern methods for recognizing digit patterns do not imply the use of collective recognition, competence areas, ensembles or other algorithms mentioned in the question.

Instead, modern methods for recognizing numbers use convolutional neural networks . Only one convolutional neural network: there is no need to use several CNN or any other cunning. Instead, the emphasis is on a specific convolutional neural network architecture (for example, what number of layers, what type of pooling type ) and training procedures (for example, adjusting the learning rate factor in the stochastic gradient method, dropout ), “packet "Normalization ( batch normalization ) and others.)

As far as I know, this is also true for object / character recognition.

My advice is this: even though the algorithms mentioned in the question look beautiful on paper, they may not really be necessary or useful in reality for the problems that interest you. I do not recommend using them in practice.


Part of Franck Dernoncourt's answer to this question :

In one of the modern systems for the classification of images obtained a tangible advantage when using the ensemble (as in most other systems, as far as I know). Reflected in the article by the authors He Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (Microsoft experts):

Deep residual learning for image recognition (ENG, 2015, PDF)

The article contains the tables:

enter image description here Translation: Table 4. Error rates (%) in the results with single models (without ensembles) based on the ImageNet validation set.

enter image description here Translation: Table 5. Error rates (%) in the results with ensembles . Top-5 error metric based on ImageNet test set.

PS Top-5 error is a metric in which the algorithm can produce 5 variants of a picture class and an error is counted if among all these variants there is no correct one.
Note: an explanation of what top-1 error is and top-5 error is also given here:
ImageNet: what is top-1 and top-5 error rate?

The MSRA team and their ResNets model won first place in the Large Scale Visual Recognition Challenge 2015 competition (ILSVRC2015)


Part of the response of jean in the same question :

I suppose that deep learning is quite relevant in most computer vision tasks (recognition, detection, obtaining super-resolution, highlighting boundaries, etc.), except for specific tasks like SLAM , where deep learning does not yet reach existing methods.

Often, ensembles and averaging are used to gain some percentage advantage to win a recognition competition. But the networks are becoming so qualitative that it no longer affects the results so much.

I would say that for the next 20 years, most computer vision products will use in-depth training, even if something more effective will appear. I’ll add to Frank’s answer that depth learning algorithms are changing so quickly that KaNing He’s ResNets are no longer relevant at the moment. Now on the basis of CIFAR , SVHN in the light of spotlights are:
- " Densely Connected Convolutional Networks / Tightly Connected Convolution Networks " and
- SGDR: Stochastic Gradient Descent with Restarts / Stochastic gradient method with restarts .
But even these results may soon cease to be relevant at the near ILSVRC 2016 .