The method, using AgilityHtmlPak, should download links from the page, discarding everything unnecessary in the process (links to pictures, anchors, etc.). Separately, everything works fine, but together it throws an error:
An unhandled exception of type 'System.NullReferenceException' occurred in Crawler.exe Additional information: An object reference does not indicate an object instance.
Here is this method:
public List<string> ParserHtml(string uri, string baseUrl) { var webGet = new HtmlWeb(); var document = webGet.Load(uri); var linksOnPage = (from lnks in document.DocumentNode.Descendants() where lnks.Name == "a" && lnks.Attributes["href"] != null let lnk1 = lnks.Attributes["href"].Value let lnk2 = !lnk1.Contains("rss") ? lnk1 : null let lnk3 = !lnk2.StartsWith("mailto:") ? lnk2 : null select lnk3 ).Distinct().ToList(); return linksOnPage; } By themselves, these pieces of code are working, for example, if you do this:
let lnk2 = !lnk1.Contains("rss") ? lnk1 : null //let lnk3 = !lnk2.StartsWith("mailto:") ? lnk2 : null select lnk2 or so:
//let lnk2 = !lnk1.Contains("rss") ? lnk1 : null let lnk3 = !lnk1.StartsWith("mailto:") ? lnk1 : null select lnk3 Does it work separately separately? those. an error occurs when calculating lnk3 from lnk2, and vice versa.