It is required to speed up the transposition of a large matrix, the elements are placed in memory sequentially. It is necessary to accelerate by processing the matrix in blocks, so that the necessary pieces of memory from the cache do not have time to be erased.

The problem arose in writing the code of the transposition itself - execution shuts up and nothing works. Below is a piece of code

void transposematrixblocked(int **src, int **dst, int size) { for (int i = 0; i < size; i + BLOCKSIZE) { for (int j = 0; j < size; j + BLOCKSIZE) { for (int ini = 0; ini < BLOCKSIZE; ini ++) { for (int inj = 0; inj < BLOCKSIZE; inj ++) { dst[i+ini][j+inj] = src[j+inj][i+ini]; } } } } } 

where have I blundered and how to do it right?

    2 answers 2

    In the for loop, the 3rd parameter must be of the form i += BLOCKSIZE

     void transposematrixblocked(int **src, int **dst, int size) { for (int i = 0; i < size; i += BLOCKSIZE) { for (int j = 0; j < size; j += BLOCKSIZE) { for (int ini = 0; ini < BLOCKSIZE; ini ++) { for (int inj = 0; inj < BLOCKSIZE; inj ++) { dst[i+ini][j+inj] = src[j+inj][i+ini]; } } } } } 

      The main error really was in the syntax - i + BLOCKSIZE , instead of i += BLOCKSIZE .

      The final working code is below:

       /* Transpose the blocked square matrix src and put the result in dst */ void transposematrixblocked(int **src, int **dst, int size) { for (int i = 0; i < size; i += BLOCKSIZE) { for (int j = 0; j < size; j += BLOCKSIZE) { for (int ini = i; ini < i + BLOCKSIZE; ini ++) { for (int inj = j; inj < j + BLOCKSIZE; inj ++) { dst[ini][inj] = src[inj][ini]; } } } }