I get a string of this type

<div class="popup"> <div class="popup-top"></div> <div class="popup-middle"> <div>Тип обложки: обл - мягкий переплет (крепление скрепкой или клеем)</div> <div>Иллюстрации: Черно-белые + цветные</div> </div> <div class="popup-bottom"> </div></div> 

I need to get this Черно-белые + цветные . and here it is <div>Иллюстрации: Черно-белые + цветные</div> , it can be absent or there can be another order. The only landmark is the word. Иллюстрации:

I tried to do it like this

  string illustration = `htmlDocument.DocumentNode.SelectNodes("//div[@class='popup']").Where(x=>x.InnerText.Contains("Иллюстрации:")).Select(s=>s.InnerText).FirstOrDefault();` 

Everything is ok, but he chooses all the text.

 "Тип обложки: 7Бц - твердая, целлофанированная (или лакированная)Иллюстрации: Цветные" 

, and I need only the Цветные , or let them just take the contents of the div? like this

 Иллюстрации: Черно-белые + цветные, дальше закрепленность слово не проблема. 
  • String [] arr = div_str.Split (':'); string need_value = arr [arr.count - 1]; Where div_str - contains the selected all text. - creamsun
  • And if you change to htmlDocument.DocumentNode.SelectNodes("//div").Where(x=>x.InnerText.Contains("Иллюстрации:")).Select(s=>s.InnerText).LastOrDefault(); ? - Surfin Bird
  • @SurfinBird returns null ( - shatoidil
  • @shatoidil, strange, just checked it seems to work . - Surfin Bird
  • "<div class = \" popup \ "> <div class = \" popup-top \ "> </ div> <div class = \" popup-middle \ "> <div> Cover type: region - soft binding ( fastening with a paper clip or glue) </ div> <div> Illustrations: Colored </ div> </ div> <div class = \ "popup-bottom \"> </ div> </ div> "Try this - shatoidil

2 answers 2

The fact is that you have the first div node with the popup class and its contents is extracted. And it is taken with all the nested nodes.

Correct your xpath expression as follows:

 "//div[@class='popup']/*/div" 

and will be retrieved what is needed.

Since data from only one node is taken from you, the first one, you can use the SelectSingleNode method SelectSingleNode . I also suggest using the power of xpath:

 string illustration = htmlDocument.DocumentNode .SelectSingleNode("//div[@class='popup']/*/div[contains(.,'Иллюстрации:')]") .InnerText; 

    Solved the problem as follows

      var illustration = htmlDocument.DocumentNode.SelectSingleNode("//*[contains(text(), 'Иллюстрации:')]").InnerText; 

    That is, the sample is on the text

    • one
      Eh, I was a little late. I approve! - Alexander Petrov
    • @AlexanderPetrov But you explained well! - shatoidil