I found an interesting material in the C # specification A.8 Dynamic memory allocation .

Slightly redid it to work with x64:

internal unsafe class Memory { // Heap API flags private const uint HEAP_ZERO_MEMORY = 0x00000008; // Heap API functions [DllImport("Kernel32")] private static extern IntPtr GetProcessHeap(); [DllImport("Kernel32")] private static extern int HeapSize(IntPtr hHeap, uint dwFlags, IntPtr lpMem); [DllImport("Kernel32")] private static extern IntPtr HeapAlloc(IntPtr hHeap, uint dwFlags, UIntPtr dwBytes); [DllImport("Kernel32")] private static extern IntPtr HeapReAlloc(IntPtr hHeap, uint dwFlags, IntPtr lpMem, UIntPtr dwBytes); [DllImport("Kernel32")] private static extern bool HeapFree(IntPtr hHeap, uint flags, IntPtr lpMem); // Handle for the process heap. This handle is used in all calls to the // HeapXXX APIs in the methods below. private static IntPtr ph = GetProcessHeap(); // Private instance constructor to prevent instantiation. private Memory() { } // Allocates a memory block of the given size. The allocated memory is // automatically initialized to zero. public static void* Alloc(uint size) { void* result = HeapAlloc(ph, HEAP_ZERO_MEMORY, new UIntPtr(size)).ToPointer(); if (result == null) throw new OutOfMemoryException(); return result; } // Copies count bytes from src to dst. The source and destination // blocks are permitted to overlap. public static void Copy(void* src, void* dst, int count) { byte* ps = (byte*)src; byte* pd = (byte*)dst; if (ps > pd) { for (; count != 0; count--) *pd++ = *ps++; } else if (ps < pd) { for (ps += count, pd += count; count != 0; count--) *--pd = *--ps; } } // Frees a memory block. public static void Free(void* block) { if (!HeapFree(ph, 0, new IntPtr(block))) throw new InvalidOperationException(); } // Re-allocates a memory block. If the reallocation request is for a // larger size, the additional region of memory is automatically // initialized to zero. public static void* ReAlloc(void* block, uint size) { void* result = HeapReAlloc(ph, HEAP_ZERO_MEMORY, new IntPtr(block), new UIntPtr(size)).ToPointer(); if (result == null) throw new OutOfMemoryException(); return result; } // Returns the size of a memory block. public static int SizeOf(void* block) { int result = HeapSize(ph, 0, new IntPtr(block)); if (result == -1) throw new InvalidOperationException(); return result; } } 

I’m still a native, so I’m asking experienced people to conduct code testing, which will mostly be a replacement for the standard byte array (byte []) in a multi-threaded application.

ps according to preliminary data HeapAlloc is faster than standard stackalloc. True professional benchmarks did not hold, I used stopwatch.

  • Native allocation of memory to start a negative impact on the garbage collector. Why do you need native memory in C #? I think you are trying to save on nails. - VladD
  • @VladD, the speed of release, the speed of release are obtained even faster than stackalloc. It is necessary for performance. - Align
  • 2
    I can not believe that you can work with memory rather than GC, it is still a complicated and smart thing. Perhaps you just need to reduce the number of small temporary objects. Yes, and maybe some of them are really structures. - VladD
  • 2
    Yeah, that's right, the buffer pool, as if its own allocator, if you want. - VladD
  • 3
    once you have reached the optimization of GC, into the flesh to abandon it, then most likely you need to fight not with the GC, but with memory traffic, i.e. get rid of creating temporary objects in large quantities, and to the maximum use them again, such as string interning, and stuff like that. - rdorn

1 answer 1

According to the idea

according to preliminary data HeapAlloc is faster than standard stackalloc

This is strange. As a rule, managed memory is allocated faster than native, since to allocate it, you only need to move the pointer, and to allocate native, you search for a sufficient free piece on the linked list. Over time, fragmentation appears and it takes longer to search the list. Except for one case.

You are constantly allocating memory for short-lived objects of the same size. In this case, pieces of the same size are constantly occupied and released in the native memory, and they quickly come to be on the list of free pieces.

But in the case of a managed memory, there will be no release inside, so you will have to constantly defragment the memory, which takes considerable time.

This problem should be solved differently:

  • Try replacing a class with a structure to use memory in the stack.
  • Try to make a reusable pool of class objects for a sufficient number of elements. It will crawl away to long-lived objects and stop interfering with the allocation of memory.

By code:

 internal unsafe class Memory 

This class does not require unsafe code at all.

 [DllImport("Kernel32")] private static extern int HeapSize(IntPtr hHeap, uint dwFlags, IntPtr lpMem); 

He is not int . Must be IntPtr with a cast in long in order to honor bits. But generally it is size_t : https://msdn.microsoft.com/ru-ru/library/windows/desktop/aa366706(v=vs.85).aspx

 void* result = (...).ToPointer(); if (result == null) throw new ...; return result; 
 IntPtr result = ...; if (result == IntPtr.Zero) throw new ...; return result; 
 public static void Copy(void* src, void* dst, int count) 

Traditionally this is not called Copy , but Move . That is, when copying, blocks should not intersect, but they can when moving. And for this there is already a ready-made function: RtlMoveMemory .

 throw new OutOfMemoryException(); throw new InvalidOperationException(); 

Well, at least some description of the exceptions?

 throw new OutOfMemoryException(); 

Not sure if you can request more memory if the hip is out.
For some reason it seems to me that yes.

  • By the way, yes, HeapAlloc is a system call, it is unclear how it can be faster than the normal allocation on the stack. I think profiling will show that it is slower, and much more. - VladD
  • @VladD, defragmentation will devour the difference. That is, if you allocated 8 bytes, allocated 8 more managed bytes, freed the first 8 bytes and repeated it a million times - you will probably give the same piece to unmanaged. But if they came from the same heap, then they were managed, then the defragmentation of memory would constantly spin there, since some of the objects survive. I think, due to this, there can really be a gain, but something needs to be fixed differently :) - Qwertiy
  • Nipanyatna! Where does fragmentation come from, if stackalloc? Everything is in the stack, when the function ends, the frame dies. - VladD
  • @VladD, oh. I thought about managed heap. In general, the object in the stack will only have a link, and the object itself in the heap. - Qwertiy