Making a neural network: how not to break the brain

Hi, Habr!

In this small note I will talk about two pitfalls, which are easy to collide with and easily broken about.

It will be about creating a trivial neural network on Keras, with which we will predict the arithmetic mean of two numbers.

It would seem that it could be easier. And indeed, nothing complicated, but there are nuances.

To whom the topic is interesting, welcome under the cat, there will not be long boring descriptions here, just a short code and comments to it.

The solution looks something like this:

import numpy as np from keras.layers import Input, Dense, Lambda from keras.models import Model import keras.backend as K # генератор данных def train_iterator(batch_size=64): x = np.zeros((batch_size, 2)) while True: for i in range(batch_size): x[i][0] = np.random.randint(0, 100) x[i][1] = np.random.randint(0, 100) x_mean = (x[::,0] + x[::,1]) / 2 x_mean_ex = np.expand_dims(x_mean, -1) yield [x], [x_mean_ex] # модель def create_model(): x = Input(name = 'x', shape=(2,)) x_mean = Dense(1)(x) model = Model(inputs=x, outputs=x_mean) return model # создаем и учим model = create_model() model.compile(loss=['mse'], optimizer = 'rmsprop') model.fit_generator(train_iterator(), steps_per_epoch = 1000, epochs = 100, verbose = 1) # предсказываем x, x_mean = next(train_iterator(1)) print(x, x_mean, model.predict(x))

We are trying to learn ... but nothing comes out. And in this place you can arrange dances with a tambourine and lose a lot of time.

 Epoch 1/100 1000/1000 [==============================] - 2s 2ms/step - loss: 1044.0806 Epoch 2/100 1000/1000 [==============================] - 2s 2ms/step - loss: 713.5198 Epoch 3/100 1000/1000 [==============================] - 3s 3ms/step - loss: 708.1110 ... Epoch 98/100 1000/1000 [==============================] - 2s 2ms/step - loss: 415.0479 Epoch 99/100 1000/1000 [==============================] - 2s 2ms/step - loss: 416.6932 Epoch 100/100 1000/1000 [==============================] - 2s 2ms/step - loss: 417.2400 [array([[73., 57.]])] [array([[65.]])] [[49.650894]]

Predicted 49, which is far from 65.

But if we alter the generator a little, everything starts working immediately.

 def train_iterator_1(batch_size=64): x = np.zeros((batch_size, 2)) x_mean = np.zeros((batch_size,)) while True: for i in range(batch_size): x[i][0] = np.random.randint(0, 100) x[i][1] = np.random.randint(0, 100) x_mean[::] = (x[::,0] + x[::,1]) / 2 x_mean_ex = np.expand_dims(x_mean, -1) yield [x], [x_mean_ex]

And it is clear that the network is already converging literally in the third era.

 Epoch 1/5 1000/1000 [==============================] - 2s 2ms/step - loss: 648.9184 Epoch 2/5 1000/1000 [==============================] - 2s 2ms/step - loss: 0.0177 Epoch 3/5 1000/1000 [==============================] - 2s 2ms/step - loss: 0.0030

The main difference is that in the first case, the x_mean object is created in memory each time, and in the second it appears when the generator is created and then it is only reused.

We understand further whether everything is true in this generator. It turns out that not really.
The following example shows that something is wrong.

 def train_iterator(batch_size=1): x = np.zeros((batch_size, 2)) while True: for i in range(batch_size): x[i][0] = np.random.randint(0, 100) x[i][1] = np.random.randint(0, 100) x_mean = (x[::,0] + x[::,1]) / 2 yield x, x_mean it = train_iterator() print(next(it), next(it))

(array([[44., 2.]]), array([10.])) (array([[44., 2.]]), array([23.]))

The average value in the first iterator call does not match the numbers on the basis of which it is calculated. In fact, the average value was calculated correctly, but since the array was passed by reference, then when the iterator was invoked for the second time, the values in the array were overwritten, and the print () function returned, which was in the array, and not what we expected.

There are two ways to fix this. Both costly, but correct.
1. Move the creation of the variable x inside the while loop so that the array at each yield creates a new one.

 def train_iterator_1(batch_size=1): while True: x = np.zeros((batch_size, 2)) for i in range(batch_size): x[i][0] = np.random.randint(0, 100) x[i][1] = np.random.randint(0, 100) x_mean = (x[::,0] + x[::,1]) / 2 yield x, x_mean it_1 = train_iterator_1() print(next(it_1), next(it_1))

(array([[82., 4.]]), array([43.])) (array([[77., 34.]]), array([55.5]))

2. Return a copy of the array.

 def train_iterator_2(batch_size=1): x = np.zeros((batch_size, 2)) while True: x = np.zeros((batch_size, 2)) for i in range(batch_size): x[i][0] = np.random.randint(0, 100) x[i][1] = np.random.randint(0, 100) x_mean = (x[::,0] + x[::,1]) / 2 yield np.copy(x), x_mean it_2 = train_iterator_2() print(next(it_2), next(it_2))

(array([[63., 31.]]), array([47.])) (array([[94., 25.]]), array([59.5]))

Now everything is fine. Go ahead.

Do I need to expand_dims? Let's try to remove this line and the new code will be like this:

 def train_iterator(batch_size=64): while True: x = np.zeros((batch_size, 2)) for i in range(batch_size): x[i][0] = np.random.randint(0, 100) x[i][1] = np.random.randint(0, 100) x_mean = (x[::,0] + x[::,1]) / 2 yield [x], [x_mean]

Everything is great at learning, although the returned data has a different shape.

For example, it was [[49.]], and it became [49.], but inside Keras, this seems to be correctly reduced to the desired dimension.

So, we know what the correct data generator should look like, now let's play with the lambda function, and look at the behavior of expand_dims there.

We will not predict anything, just consider the correct value inside lambda.

The code is as follows:

 def calc_mean(x): res = (x[::,0] + x[::,1]) / 2 res = K.expand_dims(res, -1) return res def create_model(): x = Input(name = 'x', shape=(2,)) x_mean = Lambda(lambda x: calc_mean(x), output_shape=(1,))(x) model = Model(inputs=x, outputs=x_mean) return model

We start and see that everything is fine:

 Epoch 1/5 100/100 [==============================] - 0s 3ms/step - loss: 0.0000e+00 Epoch 2/5 100/100 [==============================] - 0s 2ms/step - loss: 0.0000e+00 Epoch 3/5 100/100 [==============================] - 0s 3ms/step - loss: 0.0000e+00

Now let's try a little change our lambda function and remove the expand_dims.

 def calc_mean(x): res = (x[::,0] + x[::,1]) / 2 return res

When compiling the model, there were no errors on the dimension, but the result is different, the loss is considered incomprehensible as. Thus, here expand_dims needs to be done, nothing will automatically happen.

 Epoch 1/5 100/100 [==============================] - 0s 3ms/step - loss: 871.6299 Epoch 2/5 100/100 [==============================] - 0s 3ms/step - loss: 830.2568 Epoch 3/5 100/100 [==============================] - 0s 2ms/step - loss: 830.8041

And if you look at the returned result of predict (), you can see that the dimension is wrong, the output is [46.], and it is expected [[46.]].

Something like this. Thanks to everyone who read it. And be careful in the details, the effect of them can be significant.

Source: https://habr.com/ru/post/439038/

Making a neural network: how not to break the brain

More articles: