How to train a multilayer perceptron (remembers only the last sample)?

Question

When teaching perceptron to recognize letters, he was confronted with the fact that he remembers only the last letter from the set of samples.

Learning process:

The learning cycle is performed several dozen times for the entire pattern array. For each pattern:

In the loop, the input is a vector of pixel values of each letter, the output is compared with the response vector, in which all values are 0 except for a value equal to the position of the letter. Those. for A it is {1, 0, 0, ...}, for B {0, 1, 0, ...}, etc.
Corrected errors by the method of reverse propagation
Weights are updated

Learning of any one letter is normal: after a few repetitions, the value of the desired output neuron becomes almost 1, the rest - almost 0. But if you sequentially iterate through all the letters from A to Z and test on the letter B, for example, then only the neuron corresponding to the last pattern becomes active for training, i.e. Z.

What could be the error?

Update

To simplify, I tested on numbers. 1 hidden layer with 30th neurons (I tried and the 300 result is the same, just approximation to 0 and 1 more) and output layer with 10th. Iterations - from 10 to 100. Perhaps it is the initialization of the weights (from 0.1 to 0.3)? If you apply any letter to the input of the untrained network, the value of each of the hidden neurons is almost 1 (or 1, if there are much more neurons). Those. and A and Z for the perceptron look the same.

And more questions: is the own implementation of the neural network or any library used?
In the case of its own, there is a slightly higher probability that an error might have crept in somewhere.
Now it turns out that the weights are adjusted so that the last sample is recognized correctly.
And it seems that when you submit another sample, the network considers it the same as the last one.
The same exits when you first show any sample - it seems to be normal, because
the weights are all approximately in the same range, and the sigmoid normalizes the sum of the inputs.
But it seems that the back distribution algorithm has a parameter, a step with which weights are modified.
Just the maximum response of the output neuron is less close to 1.

Wsl_F Wsl_F 231 1 silver mark 8 bronze marks · Answer 1 · 2016-07-07T12:15:11

Weights can be negative, so it’s worth initializing from the range [-0.3; 0.3]
It is better to conduct learning eras in which there are all the letters of the alphabet. That is, in one era to drive all the patterns.
Perhaps it is worth reducing the speed of learning.
A few dozen times - just a little. It took me about 1000 eras to recognize 10 digits :)

hedgehogues hedgehogues 2,678 14 silver marks 39 bronze marks · Answer 2 · 2016-04-05T14:13:20

I would not recommend using the handwritten implementation of BackProp. Not only is it easy to mess up, so most likely you will make it less optimally than in standard libraries. On Coursera there is a wonderful course Andrew NG on the basic elements of ML. There is a BP algorithm.

Here you can see the realized BP. And the course itself is here .

Without code it is very difficult to say anything. If you present it, you can talk!

How to train a multilayer perceptron (remembers only the last sample)?

2 answers 2

More articles: