In order to parallelize the multiplications of the matrix, I created three matrices:
int** a=NULL;//Π²ΡΠ΄Π΅Π»ΠΈΠ» ΠΏΠ°ΠΌΡΡΠΈ,Π·Π°ΠΏΠΎΠ»Π½ΠΈΠ» ΡΠ»ΡΡΠ°ΠΉΠ½ΡΠΌΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΡΠΌΠΈ int** b=NULL;//Π²ΡΠ΄Π΅Π»ΠΈΠ» ΠΏΠ°ΠΌΡΡΠΈ,Π·Π°ΠΏΠΎΠ»Π½ΠΈΠ» ΡΠ»ΡΡΠ°ΠΉΠ½ΡΠΌΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΡΠΌΠΈ int** c=NULL;//Π²ΡΠ΄Π΅Π»ΠΈΠ» ΠΏΠ°ΠΌΡΡΠΈ,Π·Π°ΠΏΠΎΠ»Π½ΠΈΠ» ΡΠ»Π΅ΠΌΠ΅Π½ΡΡ Π½ΡΠ»ΡΠΌΠΈ Created matrices that will be in the GPU
int** aGPU=NULL; int** bGPU=NULL; int** cGPU = NULL; size_t pitch; And I try to write in them the values ββthat were in matrices Π° and Ρ respectively, to parallelize the calculations in the kernel.
I give them a memory:
cudaMallocPitch((void**)&aGPU, &pitch, N, N); cudaMallocPitch((void**)&bGPU, &pitch, N, N); cudaMallocPitch((void**)&cGPU, &pitch, N, N); cudaMemcpy2D(aGPU, N*sizeof(int), a, N * sizeof(int),N * sizeof(int), N, cudaMemcpyHostToDevice); cudaMemcpy2D(bGPU, N*sizeof(int), b, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); cudaMemcpy2D(cGPU, N*sizeof(int), c, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); I am interested in several things:
What is
pitch, why is it needed and how to manage it?Am I trying to allocate
cudaMallocmemorycudaMalloc?How to copy data from matrix
Π°to matrixΠ°GPU?
The minimum self-sufficient example in the studio:
int **a = NULL; MakeMem(&a); initValue(a); //show(a); int** b = NULL; MakeMem(&b); initValue(b); int** c = NULL; MakeMem(&c); int** aGPU=NULL; int** bGPU=NULL; int** cGPU = NULL; size_t pitch; cudaMallocPitch((void**)&aGPU, &pitch, N * sizeof(int), N); cudaMallocPitch((void**)&bGPU, &pitch, N * sizeof(int), N); cudaMallocPitch((void**)&cGPU, &pitch, N * sizeof(int), N); cudaMemcpy2D(aGPU, pitch, a, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice); cudaMemcpy2D(bGPU, pitch, b, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice);// Π²ΠΎΡ ΡΡΡ ΠΏΡΠΎΠΈΡΡ
ΠΎΠ΄ΠΈΡ ΠΎΡΠΈΠ±ΠΊΠ° ΠΊΠΎΠΏΠΈΡΠΎΠ²Π°Π½ΠΈΡ cudaMemcpy2D(cGPU, pitch, c, N * sizeof(int), N * sizeof(int), N, cudaMemcpyHostToDevice);