Here is the Html code snippet:

<div> <div> <a></a> <a></a> <div><a><span></span>Text1</a></div> </div> <div>Text2</div> </div> 

With the help of:

 var htmlNodes = htmlDoc.DocumentNode.SelectNodes("*"); foreach (var node in htmlNodes) { text += node.InnerText; } 

I get this line:

 "\r\n \r\n \r\n \r\n \r\n Text1\r\n Text2" 

Can I just pull out the text?

 "Text1 Text2" 
  • one
    SelectNodes("//text()[normalize-space(.) != '']") - Alexander Petrov
  • @AlexanderPetrov Damn, I forgot that you can apply Xpach in this moment, thanks a lot, I would have gone the wrong way for a long time. - Vipz
  • one
    Without xpath, using LINQ: Descendants().OfType<HtmlTextNode>().Where(n => !string.IsNullOrWhiteSpace(n.InnerText)) - Alexander Petrov
  • @AlexanderPetrov Thank you, LINQ has not yet come to study. - Vipz
  • @AlexanderPetrov clarifying question, and without using concatenation, I can get the text at once of the entire fragment? I try different manipulations, nothing comes out of htmlDoc.DocumentNode.SelectSingleNode("//text()[normalize-space(.)]").InnerText; The DocumentNode exactly that Html snippet from above. - Vipz

2 answers 2

  string text = "\r\n \r\n \r\n \r\n \r\n Text1\r\n Text2"; var str3 = new String(text.Where(ch => ch != '\r' && ch != '\n').ToArray()).Trim(); var str4 = str3.Split(' ').Where(ch => ch != String.Empty).ToArray(); string finalstring = String.Empty; for (int i = 0; i <= str4.Length - 1; i++) { if (i != str4.Length - 1) finalstring += str4[i] + " "; else finalstring += str4[i]; } 
  • Great! Does it work for me in all cases, but can't it all be replaced with one xpath? Surely it is possible, but even I can not find the key text = htmlDoc.DocumentNode.SelectSingleNode("//text()[normalize-space(.)]").InnerText; - Vipz
 var str3 = new String(text.Where(ch => ch != '\r' && ch != '\n').ToArray()).Trim(); 
  • Do you need one space between the parts of the line? - Roman Ieromenko
  • //*[normalize-space()='" + text + "'][last()] - Vipz