In principle, I understand the whole thing in streams. I am writing a program that parses big XML. At the beginning it loads the zip file and then unpacks the xml inside, the code is of course not mine and therefore the form probably does not hang when it is executed.

namespace FenixBookParser { public partial class Form1 : Form { public string educational_lit_url = "zipurl" public static string files_path = @"C:\Parser\1139697.zip"; WebClient webClient; Stopwatch sw = new Stopwatch(); Parser LoadXml = new Parser(); public Form1() { InitializeComponent(); button1.Enabled = false; } private void Download_Unzip_Btn_Click(object sender, EventArgs e) { Download_Unzip_Btn.Enabled = false; if (!Directory.Exists(@"C:\Parser\")) { Directory.CreateDirectory(@"C:\Parser\"); } else { Directory.Delete(@"C:\Parser\", true); } DownloadFile(educational_lit_url); } public void DownloadFile(string urlAddress) { using (webClient = new WebClient()) { webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed); webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged); Uri Uri = new Uri(urlAddress); sw.Start(); try { webClient.DownloadFileAsync(Uri, files_path); } catch (Exception ex) { MessageBox.Show(ex.Message); } } } private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e) { labelSpeed.Text = string.Format("{0} kb/s", (e.BytesReceived / 1024d / sw.Elapsed.TotalSeconds).ToString("0.00")); progressBar1.Value = e.ProgressPercentage; labelPerc.Text = e.ProgressPercentage.ToString() + "%"; labelDownloaded.Text = string.Format("{0} MB's / {1} MB's", (e.BytesReceived / 1024d / 1024d).ToString("0.00"), (e.TotalBytesToReceive / 1024d / 1024d).ToString("0.00")); } private void Completed(object sender, AsyncCompletedEventArgs e) { sw.Reset(); if (e.Cancelled == true) { MessageBox.Show("Загрузка была отменена"); } else { MessageBox.Show("Загружено и распаковано"); button1.Enabled = true; ZipFile.ExtractToDirectory(files_path, @"C:\Parser\"); } } private void button1_Click(object sender, EventArgs e) { LoadXml.ReadXML(@"C:\Parser\1139697.xml"); } } } 

There are a lot of my govnododerstva well, I will still be glad to any your advice. So there is no problem with this code, but as I just start reading XML

 private void button1_Click(object sender, EventArgs e) { LoadXml.ReadXML(@"C:\Parser\1139697.xml"); } 

The program works and reads everything, but the form hangs dead.

 public async void ReadXML(string xml) { try { last_record_in_db = context.Books.OrderByDescending(b => b.BookId).FirstOrDefault()?.Book_id_ozon; } catch { MessageBox.Show("не могу подключиться к базе данных"); } XmlReaderSettings settings = new XmlReaderSettings(); settings.Async = true; settings.DtdProcessing = DtdProcessing.Parse; settings.ValidationType = ValidationType.DTD; settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack); int start_record = 0; using (XmlReader reader = XmlReader.Create(xml, settings)) { while (await reader.ReadAsync()) { if (reader.NodeType == XmlNodeType.Element) { if (reader.Name == "offer") { XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(reader.ReadOuterXml()); XmlNode book_node = xmlDoc.SelectSingleNode("offer"); if (start_record == 1 || last_record_in_db == null) { start_record = 1; RecordFromOzonToDB(book_node); } int ozon_book_id = Convert.ToInt32(book_node.Attributes["id"].Value); //Как только последняя запись в базе станет ровна записи из XML при его чтении //Начинаем догружать данные в базу if (last_record_in_db == ozon_book_id) { //Начинается запись со следующей итерации start_record = 1; } } } } } } 

The method of reading XML I made it asynchronous, I thought it would help, but alas. So, who on fingers will explain how to work with threads and what should I do with this method so that it does not hang the whole form? Code http://pastebin.com/X8uF2dfd

Adding XML Example

  <offer id="146881" type="book" available="true" group_id="146881"> <url>http://www.ozon.ru/context/detail/id/146881/?from=prt_xml_facet</url> <price>379</price> <currencyId>RUR</currencyId> <categoryId>1137636</categoryId> <picture>http://static.ozone.ru/multimedia/books_covers/1004510046.jpg</picture> <store>false</store> <pickup>true</pickup> <delivery>true</delivery> <local_delivery_cost>299</local_delivery_cost> <author>Лаура Камбурнак</author> <name>Атлас животных</name> <publisher>Русич</publisher> <year>2004</year> <ISBN>978-5-8859-0680-7</ISBN> <language>Русский</language> <binding>70x108/8</binding> <page_extent>48</page_extent> <table_of_contents>Атлас животных</table_of_contents> <description>Предназначенная для детей в возрасте от 5 до 8 лет, эта красочная книга позволит им открыть для себя богатейший животный мир нашей планеты. Рассматривая цветные картинки, каждая из которых сопровождается пояснительным текстом, юные читатели познакомятся на страницах атласа с фауной всех шести континентов нашей планеты. Они узнают много нового, порой необычного и удивительного, и смогут представить зверей, рыб и птиц в условиях их естественной среды обитания.</description> <sales_notes>Бесплатная доставка при заказе от 3500 рублей</sales_notes> <barcode>9785885906807</barcode> <weight>0.667</weight> <dimensions>1.000/27.000/33.300</dimensions> <param name="Вес" unit="г">667</param> <param name="Ширина упаковки" unit="мм">270</param> <param name="Высота упаковки" unit="мм">333</param> <param name="Глубина упаковки" unit="мм">10</param> </offer> 
  • one
    Give an example of your xml-file (or give a link). As already written in the answer, one of the reasons for the form to hang is loading into an XmlDocument , another reason is working with the database (judging by the name of the *ToDB method). Also indicate the technology used to work with the database. - Alexander Petrov
  • @AlexanderPetrov Xml is very large, so I use reader. I will lay out all the code on Pasternine - shatoidil
  • one
    In your xml code, asynchronously (which is good) is read until the offer node is encountered and this node (nodes) are synchronously (which is bad) loaded into the XmlDocument and then into the database. Write at least an approximate size of these offer nodes (in lines or bytes). - Alexander Petrov
  • one
    @shatoidil: context.SaveChanges(); at each iteration - it may be slow. Try to remove the work with the base to check whether this is the problem or not. - VladD
  • one
    @shatoidil: Well, if the problem is really in this, then we will somehow solve it. And if not in this, we will look for the root of the problem further. - VladD

2 answers 2

The investigation showed that the problem is in a large number of XML data, as well as in accessing the database at each iteration.

As a simple solution, it makes sense to execute LoadXml.ReadXML in a separate thread:

 private void button1_Click(object sender, EventArgs e) { Task.Run(() => LoadXml.ReadXML(@"C:\Parser\1139697.xml")); } 

It also makes sense to try to accumulate disassembled books into an array, and add everything to the database at once, in one sitting.

  • one
    I would like to comment on the minus. - VladD

Despite the fact that you added async to the ReadXML method, this does not mean that it will execute asynchronously, try to make the returned type Task (with ReadXML) and also add async to the click handler, and await respectively inside the handler

  • 2
    Asynchronously! = In a separate thread - PashaPash
  • @PashaPash That is, several asynchronous actions can occur in one thread? - Buka
  • Yes. asynchronous via async / await implies that generally all code is executed in one thread (which removes the need to do invoke and think about synchronization). but due to magic, this stream is released while waiting for a response from the hardware (disk) and network resources. - PashaPash
  • the code can be thrown into a separate thread via Task.Run (as in the reply from VladD) - PashaPash
  • @PashaPash thank you, did not know - Buka