Strings of different lengths and from different characters, i.e. option with sorting by counting, not rolling. Made an external merge sort. I break the file into pieces of 100MB, sort each one, perform a merge. All this takes me about 10 minutes. Although there is someone else's software nearby, which in 6 minutes sorts the same volume and performs removal of duplicates from it.
I attach my code.
public static class Sorting { public static void ExternalMergeSort(string originalFile, string newFile) { //Разбить файл на части по 100 Мб string dir = SplitFile(originalFile); //Отсортировать каждую часть foreach (string file in Directory.GetFiles(dir, "*.txt", SearchOption.AllDirectories)) InternalSort(file); //Многопоточный вариант - не хватает памяти //List<Task> tasks = new List<Task>(); //foreach (string file in Directory.GetFiles(dir, "*.txt", SearchOption.AllDirectories)) //{ // Task t = new Task(() => InternalSort(file)); // tasks.Add(t); // t.Start(); //} //Task.WaitAll(tasks.ToArray()); //Объединить части MergeFilesInDirectory(dir); } /// <summary> /// Разбитие файла на куски укaзанного размера /// </summary> /// <param name="originalFile">Файл для разбития</param> /// <param name="maxFileSize">Максимальный размер части файла. Default = 100Mb</param> /// <returns>Путь к папке с частями файла</returns> private static string SplitFile(string originalFile, double maxFileSize= 1e+8) { TimeWatcher.Start("SplitFile"); var lines = File.ReadLines(originalFile); string dir = Path.GetDirectoryName(originalFile); string extDir = dir + "/" + Path.GetFileNameWithoutExtension(originalFile); if (!Directory.Exists(extDir)) Directory.CreateDirectory(extDir); string partPath = extDir + "/" + Guid.NewGuid().ToString() + Path.GetExtension(originalFile); var outputFile = new StreamWriter( File.OpenWrite(partPath)); foreach(string line in lines) { outputFile.WriteLine(line); if (outputFile.BaseStream.Position >= maxFileSize) { outputFile.Close(); partPath = extDir + "/" + Guid.NewGuid().ToString() + Path.GetExtension(originalFile); outputFile = new StreamWriter(File.OpenWrite(partPath)); } } TimeWatcher.Show("SplitFile", true); return extDir; } /// <summary> /// Внутренняя сортировка файла /// </summary> /// <param name="originalFile">Сортируемый файл</param> public static void InternalSort(string originalFile) { TimeWatcher.Start("InternalSort"); List<string> list = File.ReadAllLines(originalFile).ToList(); list.Sort(); File.WriteAllLines(originalFile, list); TimeWatcher.Show("InternalSort", true); } /// <summary> /// Слияние файлов в указанной директории /// </summary> /// <param name="dir">директория с файлами</param> private static void MergeFilesInDirectory(string dir) { TimeWatcher.Start("MergeFilesInDirectory"); // Открываем все файлы разом и формируем слой чтения List<StreamReader> readers = new List<StreamReader>(); List<string> layer = new List<string>(readers.Count); foreach (string file in Directory.GetFiles(dir, "*.txt", SearchOption.AllDirectories)) { var reader = new StreamReader(File.OpenRead(file)); readers.Add(reader); layer.Add(reader.ReadLine()); } //Создаем файл результата var writter = new StreamWriter(File.OpenWrite(dir + "/Result.txt")); int Id = 0; while(layer.FirstOrDefault(x=>x!=null) != null) { string min = layer.Min(); Id = layer.IndexOf(min); layer[Id] = readers[Id].ReadLine(); writter.WriteLine(min); } writter.Close(); foreach (var reader in readers) reader.Close(); foreach (string file in Directory.GetFiles(dir, "*.txt", SearchOption.AllDirectories)) { if (Path.GetFileNameWithoutExtension(file) != "Result") File.Delete(file); } TimeWatcher.Show("MergeFilesInDirectory", true); } } Found an opportunity to increase the maximum size of the object (in .NET limit 2GB). Thanks to this, you can safely read the entire file into memory, and already there to tear off the pieces. Reading 2 GB (with SSD in DDR3) took 0.7 seconds.
<configuration> <runtime> <gcAllowVeryLargeObjects enabled="true" /> </runtime> .. Details here: https://social.msdn.microsoft.com/Forums/en-RU/8d5880fe-108e-47d2-bbd7-4669e0aec1ec/-2-?forum=programminglanguageru
SplitFilemethod you read 100 MB and write to a new file. Then in the methodInternalSortre-read the same data. Change the algorithm: read from a large file of 100 MB in the list, immediately sort it and write it down. Then it remains only to smuggle. - Alexander Petrov