There is a neural network with the following structure:
I consistently perform the neural network training, driving through it a set of training data, namely:
1) Direct signal passing (inputs) through each matrix of scales (yes, yes, adder, activation function, all matters ...)
2) Calculation of the output layer error (matching of outputs and targets)
3) Calculation of the error of hidden layers (the derivative of the activation function, taking into account the effect of the matrix of weights on the error, all as it should)
4) Correction of weighting factors (taking into account the calculated error)
These actions in the learning process are performed N times (100 ...... 1000000)
Question? How can I organize calculations in parallel? for the learning cycle for each of the sets of input values implies working with the current values of the weights matrices. To update the weights for one set of inputs (x1 .... xn), you need to calculate the outputs (Y-1 ... Ym) on these weights, perform the correction and then drive the next inputs (x ....) ... well, with the corresponding set of targets (y ......) of course
// neural network is already written in C / C ++ in plans to use several threads to speed up work (Please do not throw links to CUDA, I know that it is similar to C and everything works there, I can’t understand what exactly is being calculated in parallel and how these the results affect each other to obtain a result set of weights matrices)
UPD : how it works approximately now ( abstract layer)
for (out = 0 ; out < outputsNeuerons ; out++) { float sum = 0.0; for (inp = 0 ; inp < inputsNeuerons ; inp++) { //inputs - то что вошло в слой sum += inputs[inp] * weightsMatrix[inp][out]; } sum += weightsMatrix[inputsNeuerons][out]; //outputs - то что вышло из слоя outputs[out] = sigmoid( sum ); } 