In order to parallelize the multiplications of the matrix, I created three matrices:

int** a=NULL;//Π²Ρ‹Π΄Π΅Π»ΠΈΠ» памяти,Π·Π°ΠΏΠΎΠ»Π½ΠΈΠ» случайными значСниями int** b=NULL;//Π²Ρ‹Π΄Π΅Π»ΠΈΠ» памяти,Π·Π°ΠΏΠΎΠ»Π½ΠΈΠ» случайными значСниями int** c=NULL;//Π²Ρ‹Π΄Π΅Π»ΠΈΠ» памяти,Π·Π°ΠΏΠΎΠ»Π½ΠΈΠ» элСмСнты нулями 

Created matrices that will be in the GPU

 int** aGPU=NULL; int** bGPU=NULL; int** cGPU = NULL; size_t pitch; 

And I try to write in them the values ​​that were in matrices Π° and с respectively, to parallelize the calculations in the kernel.

I give them a memory:

 cudaMallocPitch((void**)&aGPU, &pitch, N, N); cudaMallocPitch((void**)&bGPU, &pitch, N, N); cudaMallocPitch((void**)&cGPU, &pitch, N, N); cudaMemcpy2D(aGPU, N*sizeof(int), a, N * sizeof(int),N * sizeof(int), N, cudaMemcpyHostToDevice); cudaMemcpy2D(bGPU, N*sizeof(int), b, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); cudaMemcpy2D(cGPU, N*sizeof(int), c, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); 

I am interested in several things:

  1. What is pitch , why is it needed and how to manage it?

  2. Am I trying to allocate cudaMalloc memory cudaMalloc ?

  3. How to copy data from matrix Π° to matrix Π°GPU ?

The minimum self-sufficient example in the studio:

 int **a = NULL; MakeMem(&a); initValue(a); //show(a); int** b = NULL; MakeMem(&b); initValue(b); int** c = NULL; MakeMem(&c); int** aGPU=NULL; int** bGPU=NULL; int** cGPU = NULL; size_t pitch; cudaMallocPitch((void**)&aGPU, &pitch, N * sizeof(int), N); cudaMallocPitch((void**)&bGPU, &pitch, N * sizeof(int), N); cudaMallocPitch((void**)&cGPU, &pitch, N * sizeof(int), N); cudaMemcpy2D(aGPU, pitch, a, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); cudaMemcpy2D(bGPU, pitch, b, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice);// Π²ΠΎΡ‚ Ρ‚ΡƒΡ‚ происходит ошибка копирования cudaMemcpy2D(cGPU, pitch, c, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); 

    1 answer 1

    What is pitch, why is it needed and how to manage it?

    pitch ( pitch ) is the size of a single string of a two-dimensional array in bytes. The fact is that for the sake of speeding up access to memory when processing matrices line by line, the beginnings of lines are aligned to certain values ​​(typical is 512 bytes, but may differ depending on the device). those. the address of the element of the matrix [Row][Column] will be calculated by the formula:

     T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column; 

    Am I trying to allocate cudaMalloc memory correctly?

    Most likely not, I forgot to multiply specify the size of the type, the width of the row is indicated in bytes:

     cudaMallocPitch((void**)&aGPU, &pitch, N*sizeof(int), N); 

    How to copy data from matrix a to matrix aGPU?

    It is almost correct here, only for the second argument you need to specify the pitch as the size of the series:

     cudaMemcpy2D(aGPU, pitch, a, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); 


    Documentation: CudaMallocPitch () , cudaMemcpy2D ()

    • , anyway, an exception from memory with JudaMemcpy2D - Elvin
    • @Elvin, the minimum reproducible example in the studio ... - Fat-Zer
    • , added details to the question - Elvin
    • @Elvin, Firstly, you need more self-sufficiency (preferably to the level copied-assembled-reproduced) ... and secondly, you almost have a tight array of pointers , not a two-dimensional array (depending on what MakeMem does) - Fat-Zer