There is a https URL.

For example https://www.virustotal.com/en/domain/top.list.ru/information/

I need to get it html sourc. Without uploading pictures, css and the like. Just html code.

I tried all the methods that are advised on the foreign glassflow flow, but they all return an empty string for https, for some reason. (None of the options was checked on http, but judging by the number of pluses on the answers - c http works fine.)

I tried to manually set the header like UserAgent. The result is the same - returns an empty string.

An example of what I tried:

public static string GetHtmlFromUrl(string url) { WebClient webClient = new WebClient(); return webClient.DownloadString(url); } 

and

 public static string GetHtmlFromUrl(string url) { if (url.Length > 0) { Uri myUri = new Uri(url); // Create a 'HttpWebRequest' object for the specified url. HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri); // Set the user agent as if we were a web browser myHttpWebRequest.UserAgent = @"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"; HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse(); var stream = myHttpWebResponse.GetResponseStream(); var reader = new StreamReader(stream); var html = reader.ReadToEnd(); // Release resources of response object. myHttpWebResponse.Close(); return html; } else { return "NO URL"; } } 
  • Why parse html? The site has an official api: virustotal.com/en/documentation/public-api - PashaPash
  • @PashaPash because the question concerns not only the virustotal site, but also other sites that do not have their own API. This was an example. Although in my case it’s not a fact that there is a method in the API that gives me the information I need. At least I did not find him :) - Andrew
  • Retrieving domain reports from the link above - isn't he? And the question seems to concern a specific site - after all, only on vt there is some strange restriction on https headers. other sites have their own characteristics. - PashaPash
  • It says "for example" :) In general, I was interested in the question regarding other sites, too, because The problem was posed more than once. Just at other times I didn’t get around in the wrong way) - Andrew
  • pull data from sites by html parsing - this is not quite the right way :) - PashaPash

1 answer 1

Because the site really returns an empty string. See the sniffer - there is Content-Length: 0.

So, he does not like the request and needs more different headers so that you will not be mistaken for a bot. For example like this

  WebClient webClient = new WebClient(); webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"); webClient.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); webClient.Headers.Add("Accept-Language", "ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3"); 

already returning content.