I am writing a program for parsing a single site. The site itself is parsed using CsQuery. It is necessary at once to process the desired range of pages on the site. The initial and final links for parsing are specified and the program goes through all the pages in a range in several threads and extracts the necessary information in the List, so that after the end it is saved to a file. The required number of threads is started, and they in turn take their number from the counter of the current page and work with it. In threads, a While loop is written, so that they do not close until the last page is spars. After the end of the parsing, all information in the List is saved separately. But the problem is that the parsit will have large ranges of pages over a million, and with a test run on a range of 10,000 pages, the program begins to take up more than 1.5 gigabytes in memory. In a separate program I tried to fill the List with random data, by the type of those that should have been extracted. Added 100,000 lines, and the size of the RAM used by the program did not exceed 100 megabytes. Parsing also works correctly, it does not add any redundant data. I sin on my wrong work with threads, and the fact that the garbage collector does not destroy data from past parsing passes. I tried different ways and did not solve the problem with a memory leak. Help me find a bug, or suggest a more correct method for working with threads. I attach the code.
class Program { static int begin_of_post = 2950774; //начальный индекс постов static int end_of_post = 2951774; //конечный индекс static int current_post; //текущий пост для потоков static List<string> list_posts = new List<string>(); //список хранения данных о постах static void Main(string[] args) { ServicePointManager.DefaultConnectionLimit = 1000000000; // количество одновременных соединений current_post = begin_of_post; Thread my_tr; for (int i = 0; i < 10; i++) //запуск потоков { my_tr = new Thread(parse_site); my_tr.Start(); } Console.ReadLine(); save_to_file(); } static void parse_site() { while (current_post <= end_of_post) { int link_to_post =current_post; //ссылка на пост Interlocked.Increment(ref current_post); //инкремент счётчика CQ cq; try { cq = CQ.CreateFromUrl("http://site.ru/" + link_to_post); // загрузка кода страницы } catch { Console.WriteLine("Error " + link_to_post); continue; } string post_info; ... //сам парсинг сайта ... int current = int.Parse(link_to_post) - begin_of_post; int end = end_of_post - begin_of_post; Console.WriteLine("Обработана ссылка " + current.ToString() + " ИЗ " + end.ToString()); Thread my_tr_save=new Thread(save_post); my_tr_save.Start(post_info); } } static void save_post(object post_info) { ... // Парсинг информации о странице ... lock (list_posts) { list_posts.Add(post_info.ToString()); } } static void save_to_file() { ... //сохранение строк list_posts в файл ... } }