There is a text file size of 4 gigabytes. Approximately 100 million lines. All data is structured line by line. It is required to process this file according to a certain algorithm and as a result to form a new file. The question is how to work with this file more efficiently:

  1. Sequentially read the file with slices, working with them, until the end
  2. Load file into memory and work with all data.

What will be more effective in terms of performance? I myself think that option 2, because I heard about the barriers in memory, that it is better to store identical data in neighboring slots, rather than jump back and forth, and it is easier for the kernel to perform identical operations than to perform different tasks.

  • 3
    Reading a huge file into memory is a bad idea. Process better line by line through File.ReadLines (not ReadAllLines !). - VladD

1 answer 1

A good way to work with huge files is partial reading of the file.

Downloading the entire file, you devour the appropriate amount of RAM. Remember, when you launch a movie in HD gig so at 16, you do not eat 16 gig RAM?

Here it should also be.

If the file is text - read line by line:

 string text = ""; using (StreamReader fs = new StreamReader(@"D:\1.txt")) { while (true) { // Читаем строку из файла во временную переменную. string temp = fs.ReadLine(); // Если достигнут конец файла, прерываем считывание. if(temp == null) break; // Пишем считанную строку в итоговую переменную или как нужно обрабатываем. text += temp; } } 

Sometimes it is easier to replace the file with the database and process the records in the table. If you decide to do this, I advise you to use some kind of micro-ORM thread like PetaPoco.

If the file is NOT a text file, but a bit file - read the bits block by block if this is possible.