I am trying to implement a dimensionless self-expanding array myself, initially at 2Gb, then more. Those. when accessing an element of the array to check whether there is a cluster in the RAM, a cluster was issued, if not, so that the cluster was loaded from a file. If there are many clusters, less used ones are unloaded. Those. like some inexhaustible virtual memory. The goal is to cache the result of a sample from any DBMS, for example, if it returned a million rows.

Maybe there is a ready-made library that implements this through the IStream interface for working with files, or something? Part of the DBMS are 32bit, so that the solution can work in 32bit and 64bit. VirtualAlloc does not fit (32bit address space is limited).

Ideally, something like that

 class BigArea{ operator char& [] (__int64 index) { char * tmp_buff; //... return &tmp_buff[ index & 4095]; // Т.к. блок памяти 4096 байт. } } 

As a source of "inexhaustible" memory. Is there a library that will allow you to cache 2-4-10 Gb of memory, placing the "extra" on the disk, and the most used memory in RAM?

  • Not very clear, but what is really the question. - αλεχολυτ
  • 3
    If you need to unload several gigabytes to disk - read in chunks and write. It will be fast and reliable. Attempting to unload everything into memory does not lead to anything good. Of course, I wonder what then to do with gigabyte cvs. Not even download it to eskel :) - KoVadim 4:02
  • 12
    It seems to me that someone is trying to invent mmap. - Mikalai Ramanovich
  • one
    I do not understand well, or if you just allocate such a memory, will the operating system itself deal with swap? .. - Mikhailo
  • one
    1. Memory mapped file. 2. I highly recommend reading and using: bbs.vbstreets.ru/viewtopic.php?p=6784461#p6784461 - Qwertiy

3 answers 3

Under WinApi, simply unload everything into a temporary file ( FILE_ATTRIBUTE_TEMPORARY flag), then, without closing the file, display in memory the parts you need to read. Everything!

In Windows (at least since 7) there is a disk cache that caches frequently or recently read pieces of files into RAM. That is, doing exactly what you want to do. The FILE_ATTRIBUTE_TEMPORARY flag tells the file system what you want to create, which is characteristic, a temporary file and therefore you need to keep blocks of this file in the cache. If the memory cache is enough of course.

I do not recommend using RAM for this. There is no managed swap in WinApi, it is an automatic system-wide process. In other words, a swap begins when there is not enough memory for everyone, that is, when the whole system is already sitting. And expelling 10 GB can easily lead to this.

  • Means to use a mapping, or take FileRead FileWrite and not fool with the pages? - nick_n_a
  • @nick_n_a I meant the file mapping to memory. I feel it works faster. - Cerbo
  • I support. I planned the list of pieces of memory + mapping, it only needs mapping. Then it remains for me to find examples with mapping ... and read if the flag "memory block is modified" is present, so that not everyone will reset the disc to disk ... Or implement this feature manually. Generally as I implement - I will unsubscribe. - nick_n_a
  • "Block modified" can not search, since there is no such. Changes will have to be tracked differently, this is a separate issue. - Cerbo
  • It seems to me that everything described ultimately resembles mmap . Working as with a 'dimensionless' buffer through it has a lot of advantages, I recommend starting the tests with it, and for the code a bit, just navigating through the buffer, depending on the data. The remaining tasks seem to have already been solved inside and do not require further work. - NewView

You can easily write your own structure-manager, which will unload the old pieces on the disk and load them when accessing them in chunks; you don't even need any virtualization mechanism. Simply, the question is what kind of access is needed to such a large area with data, if you need to access it only with chunks of limited size and not random access to all data at once, then you can simply do for example the get(offset, size, dst_buf) it will ask the structure to request the necessary part and read it into the buffer, while the popular pieces need to be cached in RAM and unpopular to disk.

Those. the structure should do this:

  1. save a large piece of data transmitted from it from the DBMS inside itself.
  2. when saving data, it cuts it into pieces, say, 64KB in size, and always operates with such pieces.
  3. Frequent chunks stored in RAM, sparse on disk.
  4. It can return any segment from a piece of data at the requested offset of a given size and writes this piece to the specified buffer (this is the get function mentioned above).
  5. can change any segment within the registered piece to a given block of bytes.

Such a structure can be very economical to use memory, in the sense that it is not a hindrance to be a large amount of data, because She keeps them in small pieces and does not allocate a continuous large area in the RAM. By the way, only big data can be stored in it, small ones need to be stored in the second structure which will allocate just a continuous section classically, since A large structure has pieces of 64KB, which means that small pieces will be very inefficient to occupy RAM.

  • It seems to me that this answer is the same as the question, only slightly revised. I agree, of course everything is possible. Only your solution lacks another set - but this does not solve the problem. - nick_n_a September
  • @nick_n_a Just the same, I proposed a working version of the structure, which will solve the problem of memory fragmentation when allocating large chunks, as well as the problem of storing unused areas on the disk, i.e. in fact, this is just the answer to the question, because it was asked how to solve these two problems. The only thing that may be missing is the complete code for implementing such a structure. - Arty OneSoul

Since there is still no implementation, I collected such a "skeleton".

 class TBigMem{ private: HANDLE hFile; HANDLE hMap; __int64 hIndex; //можно переназначить тип на тот, который в вашей IDE __int64 maxsize; void * mWnd; DWORD AllocationGranularity; // Размер кратным которого должно быть окно bool DoMap(unsigned long * index){ if (hIndex >=0) UnmapViewOfFile(mWnd); hIndex = *(__int64*)index; void * wnd = MapViewOfFileEx(hMap, FILE_MAP_ALL_ACCESS, index[1], index[0],AllocationGranularity,mWnd); /*Окно*/ if (wnd == 0) wnd=MapViewOfFileEx(hMap,FILE_MAP_ALL_ACCESS, index[1], index[0],AllocationGranularity,NULL);/*Новое окно*/ if (wnd == 0) { /*Обработать ошибку - не хватает памяти скорее всего*/}; if (wnd!=0) mWnd = wnd; return wnd != 0; }; public: TBigMem (__int64 size) {// Размер можно сделать константой char buff[512]; SYSTEM_INFO si; GetSystemInfo(&si); AllocationGranularity = si.dwAllocationGranularity; GetTempPathA(sizeof(buff),buff); GetTempFileNameA(buff,"bigmem",0,buff); // Тут задать префикс для временного файла hFile = CreateFileA(buff,GENERIC_READ | GENERIC_WRITE ,0,NULL,CREATE_ALWAYS, FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE,0); hIndex = -1; Realloc(size); } bool Realloc(__int64 size){ if (!hFile) return 0; if (hIndex >=0) UnmapViewOfFile(mWnd); if (hMap) CloseHandle(hMap); hMap = 0; hIndex = -1; hMap = CreateFileMappingA(hFile,0, PAGE_READWRITE, ((unsigned long*)&size)[1], ((unsigned long*)&size)[0] ,0); if (hMap !=0) maxsize = size; return hMap != 0; } bool isValid() { return hMap !=0; }; ~TBigMem(){ CloseHandle(hMap); CloseHandle(hFile); }; char& operator[](__int64 index){ union { __int64 i; unsigned long w[2]; } u; if (index >= maxsize)// Авторасширение с выравниванием Relloc(index - (index & AllocationGranularity)+AllocationGranularity); if (hMap == 0) return *(char*)NULL; // Не возможно, обработать ошибку unsigned long offs; int err = 0; ui = index; offs = uw[0] & (AllocationGranularity-1); uw[0] &= ~(AllocationGranularity-1); //TODO: тут можно дописать сдвиг индекса, что б окно было шире if (ui != hIndex) DoMap(&u.w[0]); //err = GetLastError(); return *((char*)mWnd + offs); }; unsigned int GetAvalible(__int64 index) { // Колличество байт, доступных в окне return AllocationGranularity-((unsigned int)(index & (AllocationGranularity-1))); } }; 

Tested on small volumes like this:

  char c; TBigMem data(1000000); data[0]=1; data[65536]=2; с = data[0]; // переключить на первую страницу с = data[65536]; // переключить другую станицу 

Data is saved. I had to figure out how to do the mapping, with the granulation of the pages, with the switching of the display (in many examples of switching pages there is no).

PS I expect a more convenient implementation. At the moment this is the only example of implementation.

  • For char, this solution is suitable, and for more capacious types, you need to add a "double" buffer, because with a size of more than one byte, the type may appear on the "two-page cut".

  • I was faced with the fact that I did not find an intelligent description of how to reserve memory for MapViewOfFileEx.

  • For the "extended" functionality, you may need templates template, this is for the future.