It turns out that everything has already been invented before us. Necessary was found in Linux Test Project : the ebizzy utility. Although she is quite elderly (2007), everything you need is there:
ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, it is a large in-memory memory set. When running most efficiently, it will max out the CPU.
The code inside is pretty simple; you can figure out what and how it does without problems. Everything is clear and to the point.
The only thing that had to be changed was to add a key to getopt()
, in the presence of which it would use an external allocator instead of the system one. I will not give the whole diff, it is trivial, just a couple of moments, if you suddenly want to implement your own:
static void * alloc_mem(size_t size) { char *p; int err = 0; /* mmap мы не тестируем, на этот кусок не обращаем внимания: */ if (always_mmap) { p = mmap((void *) 0, size, (PROT_READ | PROT_WRITE), (MAP_PRIVATE | MAP_ANONYMOUS), -1, 0); if (p == MAP_FAILED) err = 1; } else { /* ВОТ ОНО: */ p = use_external_alloc ? external_alloc(size) : malloc(size);
And here:
static void free_mem(void *p, size_t size) { if (always_mmap) munmap(p, size); else /* ЕЩЁ ОНО: */ use_external_alloc ? external_free(p) : free(p); }
Now we run tests. Everywhere we add key- -t 1
(we work in 1 stream). Standard start with system allocator, the size of memory blocks is fixed. The default depends on the hardware / OS and in this case is 524288 bytes, that is, 512 Kb.
$ ./ebizzy -t 1 4484 records/s real 10.00 s user 3.47 s sys 7.17 s
Run with external allocator:
$ ./ebizzy -k -t 1 16553 records/s real 10.00 s user 10.45 s sys 0.09 s
The same, but instead of standard memory blocks, random blocks of random sizes are generated, up to 512 Kb:
$ ./ebizzy -R -t 1 75828 records/s real 10.00 s user 8.98 s sys 2.28 s $ ./ebizzy -R -k -t 1 89585 records/s real 10.00 s user 11.20 s sys 0.07 s
As you can see, the values of user+sys
in both pairs are almost the same. But the processing speed in the first case is almost 4 times different! In the case of blocks of random size, the difference is no longer the same, but it is, and is stably maintained with a large number of launches. But it’s too early to rejoice when the block sizes are reduced to 1 Kb, the system allocator takes the lead (and in many cases this is a much more frequent situation):
./ebizzy -s 1024 -t 1 5394867 records/s real 10.00 s user 11.84 s sys 0.22 s ./ebizzy -s 1024 -k -t 1 4953828 records/s real 10.00 s user 11.89 s sys 0.13 s
In general, you can drive in different ways, analyze, draw conclusions. Run gprof / gcov and think further. But the main thing is that the technology has been mastered, and the utility is fully justified.