the model does not converge on a small dataset

Question

Recently I study tensorflow, the examples that are described in the books work, but they use the MNIST melon sets. But if I enter my micro-set data, the model does not converge. Earlier, I developed my network and it worked on small test suites not in the tensorflow library. An example of such an input is an array for example [10,20,30,40] , and an output for example [0.1,0.4,0.5] , that is, the input 4 neuron to output 3 . The model is trained, we check for input [10,20,30,40] we get [0.1,0.4,0.5] . But in practice, the model does not converge, starts learning and fades away. What is wrong with networking and input? It looks like fading gradients, but I'm not sure. How can this be fixed?

 import tensorflow as tf import numpy as np from keras.models import Sequential from keras.layers import Input, Dense, Activation from sklearn import preprocessing model=Sequential() model.add(Dense(10,input_shape=(4,),activation="elu",init="uniform")) model.add(Dense(10,activation="elu",init="uniform")) model.add(Dense(3,activation="elu",init="uniform")) model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"]) def data(): X = np.array([[40, 80, 30, 60], [100, 40, 20, 80], [90, 190, 10, 15]]) Y = np.array([[10, 20, 30], [10, 40, 60], [80, 90,100]]) X=X/100 Y=Y/100 return X,Y x_train,Y_train=data() model.fit(x_train,Y_train,batch_size=3,epochs=300, verbose=1) a=np.array([40, 80, 30, 60])/100; predict_dataset = tf.convert_to_tensor(x_train,dtype=tf.float32) reshy =a.reshape((1,- 1)) print("test") print(reshy) prediction = model.predict(reshy) print('prediction') print(prediction) print('x_train') print(x_train) print('Y_train') print(Y_tirain) print("Привет")

Something here is all mixed up in a pile ... What should this model predict?

MaxU MaxU 52.5k 6 18 51 · Accepted Answer · 2019-02-11T12:38:38

Judging by the tensor Y , you have a regression problem. For the regression problem, you have unsuccessfully selected the activation function on the last output layer. In addition, the loss function loss="categorical_crossentropy" used only for classification tasks.

Here is the corrected regression model:

 X = np.array([[40, 80, 30, 60], [100, 40, 20, 80], [90, 190, 10, 15]]) Y = np.array([[10, 20, 30], [10, 40, 60], [80, 90,100]]) model = Sequential() model.add(Dense(16, input_shape=(X.shape[1],))) model.add(Activation('relu')) model.add(Dense(16)) model.add(Activation('relu')) model.add(Dense(3)) model.compile(loss='mean_squared_error', optimizer='Adamax', metrics=['mae']) model.fit(X, Y, epochs=500, verbose=1)

Conclusion:

 ... Epoch 498/500 3/3 [==============================] - 0s 669us/step - loss: 4.5900e-07 - mean_absolute_error: 5.2166e-04 Epoch 499/500 3/3 [==============================] - 0s 503us/step - loss: 4.2649e-07 - mean_absolute_error: 5.0259e-04 Epoch 500/500 3/3 [==============================] - 0s 334us/step - loss: 4.0100e-07 - mean_absolute_error: 4.8796e-04 Out[49]: <keras.callbacks.History at 0x1ec427eac88>

PS in order to build a more adequate model (which will not be so much subject to retraining) you need to add regularization and use a much larger set of data for training and testing.

Thank you, apparently there is still much to learn, did not expect such a response.
@ Alexandr1234567890, in order to thank the author of the answer, you can vote for the answer (you need to have at least 15 reputation points) and / or accept the answer as correct

the model does not converge on a small dataset

1 answer 1

More articles: