Help is needed. I need to parse the HTML page after clicking on it on some button. How to do this with HtmlAgilityPack?

  • Everything is simple, now 5 seconds - alex-rudenkiy

1 answer 1

Here is a simple example of parsing elements with the "li" tag on wikipedia, there are several Get'ov in this library, that is, you can parse not only by tags, but also for example by id'am and name'am.

public String GetHTML(String urlAddress) { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); if (response.StatusCode == HttpStatusCode.OK) { Stream receiveStream = response.GetResponseStream(); StreamReader readStream = null; if (response.CharacterSet == null) { readStream = new StreamReader(receiveStream); } else { readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet)); } string data = readStream.ReadToEnd(); return data; response.Close(); readStream.Close(); } return null; } public void parse() { var parser = new HtmlParser(); var document1 = parser.Parse(GetHTML("https://ru.wikipedia.org/wiki/Заглавная_страница")); var Tab = document1.GetElementsByTagName("li"); } 

But it is better to do as advised EvgeniyZ , just download JSON and parse it.

  • The question was different. I will describe the situation as follows: 1) I open some web page 2) It contains some content (text) 3) I click on this page on some button 4) The content of the page changes Actually the question of how to parse the modified page using HtmlAgilityPack, and not original? - Nick Laptev
  • sport-express.ru/football/friendly/clubs/… - is a link to the site. There is a "Protocol" button. I want to parse the page after clicking this button - Nick Laptev
  • @NickLaptev goes like this: you put the webbrowser component and then you click, and he spars? Or should it all be on the machine? - alex-rudenkiy
  • 2
    You can always find a way easier . - EvgeniyZ
  • 2
    @ alex-rudenkiy Here the site does not encrypt data, you can just look in the browser. Let's say we take Chrome, click F12, the Network tab and update the page. Sort by the Type column and look for xhr and see there? Json = 1. Some sites are so obviously not giving out and you have to catch requests through Filder or similar programs, or parse JS scripts ... - EvgeniyZ