Can you please tell us how best to store a multidimensional dynamic array (ranging in size from 120GB to 8-20 TB), as well as access its elements without loading the entire array into RAM?

The main condition is a quick read access to the elements of the array, the rest does not matter. In extreme cases, you can any programming language, not just C #.

  • And where do you think the array is stored initially? : without loading the entire array into RAM? - Salivan
  • Initially, it is created in RAM, will be filled with data, then when the size of the OP is not enough, I need to save it and how to work with it from the hard disk, is it possible? - Merlin
  • OP is, in terms of RAM, right? @Merlin, here you are, for sure, not new to programming, but it all turns upside down: when the size of the OP is not enough, I need to save it and how to work with it from the hard disk Well this is absurd! Ie you expect to store in the array +1000000 values!? Doesn't that seem strange to you? C # is quite a flexible language, most of the “black” work it takes on, unlike C ++, for example. But what you are trying to do, the C # developers could not have foreseen =) - Salivan
  • one
    @Merlin What does "all of it fit into the RAM for sure"? And by the way, who is this SDD (I don’t have one)? Those. array does not fit in 128GB? And if so, on what equipment are you going to solve the problem and for how long? And is it worth taking on such a super-mega-project? (Recall how many hundreds of personal computers initially drove their MapReduce google?) - alexlz
  • one
    @alexlz can keep your opinion about the project. I put quite clearly asked question. > what is the best way to store> a multidimensional dynamic array (> from 120GB to 8-20 TB in size), and also get access to its elements without loading the entire array into> RAM? Quoted1>> The main condition is quick access> readings to the elements of the array; In extreme cases, you can any> programming language, not just C # This is the problem, if you do not see it, please do not divorce demagoguery. PC - the usual average desktop. - Merlin

1 answer 1

If the array is "dense", then quickly (although it depends on the method of use) will not work.

Discharged (the vast majority of elements are zeros (or some other predefined value)) can be stored in a hash table with a key of indexes.

  • Thank. And storage and access to the elements of the array is not in RAM, how to implement? - Merlin
  • Good question. For example, you can store in a DBMS. But most likely you need to come up with a file format (direct access) in which you can effectively store the hash table. There will simply be the problem of effective access to collisions. It is necessary to come up with such a format so that the elements of collisions fall into one “record” on the disk (in general, so that they can be read by one read). - avp
  • @Merlin, as I understood from other comments, you are going to use SSD. The idea is good, especially in terms of storing direct access structures, the correct approach to the format may be different compared to the HDD, but as far as I know there are pitfalls with performance. It is known that Oracle logs should not be placed on SSD (it turns out to be faster on HDD). True, I personally have not been able to experiment with them (SSD, etc.). - avp 8:04
  • @avp third-party DBs do not yet want to use, I will try as you have recommended to experiment with the file format. Tell me please, is this article suitable for the solution? habrahabr.ru/post/124900 I have never encountered file formats. - Merlin
  • 2
    Looked at article in Habré. I do not think this is your case. I meant binary file formats (without claims for cross-platform). Those. a format that is quickly (almost without conversions) read-written. Something with which you can work directly with the system read / write / lseek. Naturally, the addresses of the parts of the file read in the memory must be configured. Inside the parts you may be able to store certain offsets (indexes) and work with them, rather than directly with pointers. Generally speaking this, I probably mean more implementation in C. - avp