It is required to speed up the transposition of a large matrix, the elements are placed in memory sequentially. It is necessary to accelerate by processing the matrix in blocks, so that the necessary pieces of memory from the cache do not have time to be erased.
The problem arose in writing the code of the transposition itself - execution shuts up and nothing works. Below is a piece of code
void transposematrixblocked(int **src, int **dst, int size) { for (int i = 0; i < size; i + BLOCKSIZE) { for (int j = 0; j < size; j + BLOCKSIZE) { for (int ini = 0; ini < BLOCKSIZE; ini ++) { for (int inj = 0; inj < BLOCKSIZE; inj ++) { dst[i+ini][j+inj] = src[j+inj][i+ini]; } } } } } where have I blundered and how to do it right?