CUDA limit number of processed arrays by one kernel

Question

Maybe the question is stupid, but still ask it. I will describe the situation, there are ten vectors of the same size, let the first vector be filled with random numbers, the remaining vectors are simply declared, allocate memory on the “device” for each of these arrays, naturally observing the correspondence in the sizes of the vectors and the allocated memory area on the “device”, then copy the first vector is on the “device”, after which we launch the “kernel” in it there is a simple addition of one to each element of the vector, followed by recording the result of the addition into the next vector 10 times, after which we copy the result The last vector is from the “device” to the “host” and the subsequent sequential output of vector elements to the console, and everything would seem simple and primitive, but an incomprehensible problem arises, for example, the result of the fifth vector is output, and the elements of the subsequent vectors turn into zeros. I would be extremely grateful if someone pokes my nose at the mistake.

No, it does not fall off, the whole calculation lasts about 1 ms.

CUDA limit number of processed arrays by one kernel

0

More articles: