Not really a question, rather an article-example to a question . Optimization methods based on efficient use of equipment , item “Bypassing data access latency”, paragraphs. "Grouping the desired data."
Quite an example from life - a system of particles. The first option is an array of structures (in the results, two left columns), the second is two arrays with structures broken down by purpose (the main data are in one structure, the auxiliary ones are in the other).
Naturally, the example does not contain many different data processing procedures. It only demonstrates the change in performance on a small fragment of the real system. Ie, let's say, we have a large complex software system (say, Folding @ Home), in which a large amount of data is an array of records. One of the typical computational problems on this data, with the speed of calculation of which the problem arose - the processing of a subset of data. This technique can accelerate this calculation. This will cause some problems (less beautiful code, a slight slowdown of some other data processing procedures). In general, as usual with optimization.
#include <stdafx.h> #include <conio.h> #include <math.h> #include <Windows.h> const int size = 1 << 24; const int repeatCount = 10; typedef float TVector[3]; struct TSparkle_Full { TVector coords, speed; COLORREF color; float startSize; float startLuminTime, fadingTime; float lifeTime; }; TSparkle_Full g_sparkles_Full[size]; struct TSparkle1 { TVector coords, speed; }; struct TSparkle2 { COLORREF color; float startSize; float startLuminTime, fadingTime; float lifeTime; }; TSparkle1 g_sparkles1[size]; TSparkle2 g_sparkles2[size]; void test1() { LARGE_INTEGER start, end, freq; QueryPerformanceFrequency(&freq); int i; QueryPerformanceCounter(&start); memset(g_sparkles_Full, 0, sizeof(g_sparkles_Full)); for (i = 0; i < size; i++) { g_sparkles_Full[i].speed[0] = float(rand()); g_sparkles_Full[i].speed[1] = float(rand()); g_sparkles_Full[i].speed[2] = float(rand()); } QueryPerformanceCounter(&end); printf("%16.6g", double(end.QuadPart - start.QuadPart) / freq.QuadPart); QueryPerformanceCounter(&start); for (int n = 0; n < repeatCount; n++) for (i = 0; i < size; i++) { g_sparkles_Full[i].coords[0] += g_sparkles_Full[i].speed[0]; g_sparkles_Full[i].coords[1] += g_sparkles_Full[i].speed[1]; g_sparkles_Full[i].coords[2] += g_sparkles_Full[i].speed[2]; } QueryPerformanceCounter(&end); printf("%16.6g", double(end.QuadPart - start.QuadPart) / freq.QuadPart); } void test2() { LARGE_INTEGER start, end, freq; QueryPerformanceFrequency(&freq); int i; QueryPerformanceCounter(&start); memset(g_sparkles1, 0, sizeof(g_sparkles1)); memset(g_sparkles2, 0, sizeof(g_sparkles2)); for (i = 0; i < size; i++) { g_sparkles1[i].speed[0] = float(rand()); g_sparkles1[i].speed[1] = float(rand()); g_sparkles1[i].speed[2] = float(rand()); } QueryPerformanceCounter(&end); printf("%16.6g", double(end.QuadPart - start.QuadPart) / freq.QuadPart); QueryPerformanceCounter(&start); for (int n = 0; n < repeatCount; n++) for (i = 0; i < size; i++) { g_sparkles1[i].coords[0] += g_sparkles1[i].speed[0]; g_sparkles1[i].coords[1] += g_sparkles1[i].speed[1]; g_sparkles1[i].coords[2] += g_sparkles1[i].speed[2]; } QueryPerformanceCounter(&end); printf("%16.6g\n", double(end.QuadPart - start.QuadPart) / freq.QuadPart); } int _tmain(int argc, wchar_t* argv[]) { printf("%16s%16s%16s%16s\n", "Fill", "Add speed", "Fill 2", "Add speed 2"); for (int i = 0; i < 5; i++) { test1(); test2(); } return 0; }
MS Visual Studio 2010, intel Core i5 processor about 2.5 GHz, DDR3 memory. Each test is run several times (each run corresponds to a line in the results. I do not pretend to measure accuracy - processes against the background, hyperthreading, ... But everything is visible and this: the processing speed of coordinates and speeds differs twice (1st and 3rd I column):
Fill Add speed Fill 2 Add speed 2 2.20742 2.67801 1.66939 1.44776 1.63304 2.62465 1.66015 1.45007 1.65731 2.61806 1.63312 1.43443 1.6489 2.64562 1.70421 1.48133 1.65934 2.67894 1.63182 1.47675
And if you add std :: string, the difference is 3 times :)