Immediately I warn you, the question is within the framework of the test assignment given to me for employment by juna, Please do not beat me with sticks and at least send it along the right path. The task is limited to .Net 3.5

Actually the question:

We have a file of more than 30GB in size, the task is to compress it correctly. Planned Algorithm:

  1. Run two filestream to read / write.
  2. We read the file in small portions (10 mb each) and write them to the array of initial data
  3. We run a couple of threads that will take blocks from this array, compress them and put them into an array of compressed data.
  4. FileStream for recording picks up blocks from the array of compressed data and writes them to a file.

Several questions arise at once:

  1. If the file exceeds the size of RAM, then I can not keep in memory an array with blocks of this huge file, since it does not fit there.

  2. Even if somehow it is possible to realize the removal of already compressed blocks from the original array, the problem will remain in the array of compressed blocks.

  3. Multithreading even in two streams does not guarantee that the blocks of the first and second arrays will retain their places, the streams can operate at different speeds. Accordingly, the output turns porridge.

The result is sad, the algorithm must be anathematized.

What should be the correct algorithm for processing to be carried out in several streams and the amount of data processed was placed in RAM ???

UPD ^^ With fresh thoughts in my head there was an idea to use hard resources. But unfortunately, tons of reading the FileStream information did not give an understanding of what comes out when trying to use FileStream.Read (), and whether it can be used in this context.

UPD ^^ For compression, use GZipStream

  • 2
    That was the question of how to combine several blocks, compressed separately: stackoverflow.com/questions/14744692/ ... Despite the fact that the question concerned zlib, and there is answers from Mark Adler - one of the authors of zlib, the topic was not fully disclosed . Harsh little problem for Joon. - user239133
  • one
    Multi-threaded packaging is impossible so simply if you do not have a block packaging algorithm (that is, packs blocks one by one). Humble yourself. - VladD
  • four
    Or ask for the tripling of salaries and write your own packaging algorithm. - VladD
  • one
    Or use something like github.com/force-net/blazer - although it is not multi-threaded, as already mentioned, multithreading is meaningless here. - Daniel Protopopov
  • 2
    >> We have a file of more than 30GB in size, the task is to compress it correctly - for starters, clarify the question: what does “correctly compress” mean, and why do you need to load everything into RAM at once ??? Let's just say: if the output format is not specified (for example, compatibility with standard zip), then everything is much simpler. Divide the file length by n (or nn) chunks == the number of required threads, at the beginning of each thread, make a Seek (desired_position), and compress them in pieces, reading in blocks of 4096 (or as many) bytes. As I understand it, this is the implementation that an employer requires from you? - SeNS

0