Problem with parsing page c #

Question

There is a code:

private string test () { HttpRequest req = new HttpRequest(); req.Cookies = new CookieDictionary(); HttpResponse resp = req.Post("Здесь страница логина пароля"); HttpResponse resp1 = req.Post("После логина пароля переходит на эту страницу, для вытаскивания данных"); string k = resp1.ToString(); var htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.LoadHtml(k); var node = htmlDoc.DocumentNode.SelectSingleNode("//*[@id='process']/table[2]/tbody/tr[2]/td[2]/span"); return "Test" + node.InnerText; }

After this, a NULL value is returned. On some sites everything works fine. The path to getting the value is correct, I take it through XPath , It works somewhere, somewhere it does not. Please help! After the error crashes, I look at string k in the HTML visualizer and it shows all the text I need. But when parsing it is still NULL .

what you need can be reloaded later and your implementation does not receive this information.
@nick_n_a I checked, even took the value that appears at the very beginning of the document, and still it does not parse.
Your question comes down to the question "how to make xpath request".
Read about xpath here msdn.microsoft.com/en-us/library/ms256115%28v=vs.110%29.aspx Not having the source text to which you make a request (up to a byte) to draw conclusions is not possible.
I would like to note that not all sites adhere to the XML standard (they have correctly opened closed tags and attributes) Therefore, this approach will not work for all sites.

pavel1787mego 621 one 6 27 · Answer 1 · 2017-07-07T08:35:48

Try this! If the value is also Null, then change the value in Pattern.

 //Метод возвращает загруженную страницу HTML public HtmlAgilityPack.HtmlDocument GetHtmlDocument(string uri, string _method) { //Класс для работы с загрузкой страницы для парсинга HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); //Выполняем запрос загрузки странницы с параметром запроса return web.Load(uri, _method); } string Pattern = "//*[@id='process']/table[2]/tbody/tr[2]/td[2]/span"; //Получаем все узлы данного класса в HTML разметке HtmlAgilityPack.HtmlNodeCollection nodes = GetHtmlDocument("https://", "GET").DocumentNode.SelectNodes(Pattern); //Перебираем все узлы в указанном месте foreach (HtmlAgilityPack.HtmlNode node in nodes) { string value = node.InnerText; }

Problem with parsing page c #

1 answer 1

More articles: