Writes some internal memory allocator. In principle, speed is not so important, but I still want to test its effectiveness and try to find bottlenecks.

Banal tests like "a million times to allocate / release and compare with malloc / free" do not seem to be meaningful.

The only thing that is invented so far is to take some relatively heavy real computational task that intensively allocates and frees memory, and drive on it. What could it be? Working with trees, something else? I would like some real examples, but nothing comes to mind.

Or other options?

  • I would take any 3D game (the same classic doom / quake) and give him an allocator. And looked at the difference. And in dynamic games, it is usually felt. But you need to check if they use their internal allocators :). - KoVadim
  • @KoVadim, too subjective assessment will be. I want accurate tsiferok :) And study the data gprof. Yes, and it is not intended for the global replacement of system allocators - this is a private (for the time being) case when the entire distribution goes within one previously captured fragment. Suppose within a few megabytes. - PinkTux
  • one
    Then you need to do benchmarks on your specific task. - KoVadim
  • So just want something else. In addition, it is not easy to bring it to the desired form, there memory jerks an hour over a teaspoon ... - PinkTux
  • If the memory twitches in a teaspoon, then I see no point in a custom allocator and generally test for bottlenecks. - KoVadim

1 answer 1

It turns out that everything has already been invented before us. Necessary was found in Linux Test Project : the ebizzy utility. Although she is quite elderly (2007), everything you need is there:

ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, it is a large in-memory memory set. When running most efficiently, it will max out the CPU.

The code inside is pretty simple; you can figure out what and how it does without problems. Everything is clear and to the point.

The only thing that had to be changed was to add a key to getopt() , in the presence of which it would use an external allocator instead of the system one. I will not give the whole diff, it is trivial, just a couple of moments, if you suddenly want to implement your own:

 static void * alloc_mem(size_t size) { char *p; int err = 0; /* mmap мы не тестируем, на этот кусок не обращаем внимания: */ if (always_mmap) { p = mmap((void *) 0, size, (PROT_READ | PROT_WRITE), (MAP_PRIVATE | MAP_ANONYMOUS), -1, 0); if (p == MAP_FAILED) err = 1; } else { /* ВОТ ОНО: */ p = use_external_alloc ? external_alloc(size) : malloc(size); 

And here:

 static void free_mem(void *p, size_t size) { if (always_mmap) munmap(p, size); else /* ЕЩЁ ОНО: */ use_external_alloc ? external_free(p) : free(p); } 

Now we run tests. Everywhere we add key- -t 1 (we work in 1 stream). Standard start with system allocator, the size of memory blocks is fixed. The default depends on the hardware / OS and in this case is 524288 bytes, that is, 512 Kb.

 $ ./ebizzy -t 1 4484 records/s real 10.00 s user 3.47 s sys 7.17 s 

Run with external allocator:

 $ ./ebizzy -k -t 1 16553 records/s real 10.00 s user 10.45 s sys 0.09 s 

The same, but instead of standard memory blocks, random blocks of random sizes are generated, up to 512 Kb:

 $ ./ebizzy -R -t 1 75828 records/s real 10.00 s user 8.98 s sys 2.28 s $ ./ebizzy -R -k -t 1 89585 records/s real 10.00 s user 11.20 s sys 0.07 s 

As you can see, the values ​​of user+sys in both pairs are almost the same. But the processing speed in the first case is almost 4 times different! In the case of blocks of random size, the difference is no longer the same, but it is, and is stably maintained with a large number of launches. But it’s too early to rejoice when the block sizes are reduced to 1 Kb, the system allocator takes the lead (and in many cases this is a much more frequent situation):

 ./ebizzy -s 1024 -t 1 5394867 records/s real 10.00 s user 11.84 s sys 0.22 s ./ebizzy -s 1024 -k -t 1 4953828 records/s real 10.00 s user 11.89 s sys 0.13 s 

In general, you can drive in different ways, analyze, draw conclusions. Run gprof / gcov and think further. But the main thing is that the technology has been mastered, and the utility is fully justified.