In short, he wrote a solution to the differential equation in partial derived by a parallel algorithm (using OpenMP) and serial. The joke is that with Debug compilation, I have a sequential algorithm running about 75-85 seconds, and a parallel one 50-55 seconds. It seems to be OK. But it is only necessary to compile in Release mode, so the parallel algorithm runs 25-30 seconds, and the sequential 18-20 seconds !!! So, maybe someone had the same problem? I will not throw off all the code, I'm afraid not to master. Throw off a piece of code:

void DifferenceScheme::solveOMP() { n = round((xMax - xMin) / xStep); m = round((tMax - tMin) / tStep); u = dMatrix(n + 1, dVector(m + 1, initValue)); TridiagonalMatrix tm(n + 1); dVector F(n + 1); Progonka progonka(tm, F); double error; workTime = clock(); #pragma omp parallel for for (int i = 0; i <= n; i++) { u[i][0] = w(xMin + i * xStep, tMin); } #pragma omp parallel for for (int i = 0; i <= m; i++) { u[0][i] = w(xMin, tMin + i * tStep); u[n][i] = w(xMax, tMin + i * tStep); } for (int j = 1; j <= m; j++) { do { #pragma omp parallel for for (int i = 1; i < n; i++) { tm.lower[i - 1] = dLeft(i, j); tm.main[i] = dCenter(i, j); tm.upper[i] = dRight(i, j); F[i] = -1 * f(i, j); } tm.main[0] = tm.main[n] = 1; F[0] = -1 * (u[0][j] - w(xMin, tMin + j * tStep)); F[n] = -1 * (u[n][j] - w(xMax, tMin + j * tStep)); progonka.solveOMP(); #pragma omp parallel for for (int i = 1; i < n; i++) { u[i][j] += progonka.x[i]; } error = 0; for (int i = 1; i < n; i++) if (fabs(progonka.x[i]) > error) error = fabs(progonka.x[i]); } while (error > accuracy); } workTime = clock() - workTime; } 
  • Are you embarrassed by the decrease in the release time or is the parallel implementation slower in the release? - Vladimir Martyanov
  • While counting, do you see the download of all cores? By the way, how many of them? - gbg
  • @ Vladimir Martiyanov confuses me that in the release the parallel implementation is slower. - van9petryk
  • @gbg No, I did not watch the download, but you can hear the fan as it starts to turn. Nucleus 2. It can be viewed through the dispatcher? Or do you need special software? - van9petryk
  • @ van9petryk is quite a dispatcher - it has core graphics. and how did you measure time? - gbg

2 answers 2

I believe that N (matrix size) is not enough for you, and the costs for starting threads are comparable to the execution time.

To verify this, run your software on a larger N.

  • one
    You are right, but I am also the reason for badly splitting - van9petryk

In general, my mistake is that I wrote #pragma omp parallel for everywhere. Thus, I created and deleted constantly streams. I managed to adjust the time of a parallel program to a time by successively changing the parallelism. Now after do {I do #pragma omp parallel, and near each cycle #pragma omp for. thus, threads are created and deleted only at the beginning and at the end do {}