The method, using AgilityHtmlPak, should download links from the page, discarding everything unnecessary in the process (links to pictures, anchors, etc.). Separately, everything works fine, but together it throws an error:

An unhandled exception of type 'System.NullReferenceException' occurred in Crawler.exe Additional information: An object reference does not indicate an object instance.

Here is this method:

public List<string> ParserHtml(string uri, string baseUrl) { var webGet = new HtmlWeb(); var document = webGet.Load(uri); var linksOnPage = (from lnks in document.DocumentNode.Descendants() where lnks.Name == "a" && lnks.Attributes["href"] != null let lnk1 = lnks.Attributes["href"].Value let lnk2 = !lnk1.Contains("rss") ? lnk1 : null let lnk3 = !lnk2.StartsWith("mailto:") ? lnk2 : null select lnk3 ).Distinct().ToList(); return linksOnPage; } 

By themselves, these pieces of code are working, for example, if you do this:

  let lnk2 = !lnk1.Contains("rss") ? lnk1 : null //let lnk3 = !lnk2.StartsWith("mailto:") ? lnk2 : null select lnk2 

or so:

  //let lnk2 = !lnk1.Contains("rss") ? lnk1 : null let lnk3 = !lnk1.StartsWith("mailto:") ? lnk1 : null select lnk3 

Does it work separately separately? those. an error occurs when calculating lnk3 from lnk2, and vice versa.

    1 answer 1

    Ok, you met a line starting with rss .
    When calculating lnk2 you got null - i.e. you have no object, and you try to call its method when calculating lnk3 .
    If lnk2 will not be used - replace let lnk2 = !lnk1.Contains("rss") ? lnk1 : null let lnk2 = !lnk1.Contains("rss") ? lnk1 : null on let lnk2 = !lnk1.Contains("rss") ? lnk1 : "" let lnk2 = !lnk1.Contains("rss") ? lnk1 : "" . This will not break the query logic.
    Or handle null when calculating lnk3 : lnk3 = lnk2==null?null:!lnk2.StartsWith("mailto:") ? lnk2 : null lnk3 = lnk2==null?null:!lnk2.StartsWith("mailto:") ? lnk2 : null