c # Get html code by ur URL. Https get request

Question

There is a https URL.

For example https://www.virustotal.com/en/domain/top.list.ru/information/

I need to get it html sourc. Without uploading pictures, css and the like. Just html code.

I tried all the methods that are advised on the foreign glassflow flow, but they all return an empty string for https, for some reason. (None of the options was checked on http, but judging by the number of pluses on the answers - c http works fine.)

I tried to manually set the header like UserAgent. The result is the same - returns an empty string.

An example of what I tried:

public static string GetHtmlFromUrl(string url) { WebClient webClient = new WebClient(); return webClient.DownloadString(url); }

and

 public static string GetHtmlFromUrl(string url) { if (url.Length > 0) { Uri myUri = new Uri(url); // Create a 'HttpWebRequest' object for the specified url. HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri); // Set the user agent as if we were a web browser myHttpWebRequest.UserAgent = @"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"; HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse(); var stream = myHttpWebResponse.GetResponseStream(); var reader = new StreamReader(stream); var html = reader.ReadToEnd(); // Release resources of response object. myHttpWebResponse.Close(); return html; } else { return "NO URL"; } }

The site has an official api: virustotal.com/en/documentation/public-api
@PashaPash because the question concerns not only the virustotal site, but also other sites that do not have their own API.
Although in my case it’s not a fact that there is a method in the API that gives me the information I need.
And the question seems to concern a specific site - after all, only on vt there is some strange restriction on https headers.
It says "for example" :) In general, I was interested in the question regarding other sites, too, because
pull data from sites by html parsing - this is not quite the right way :)

vitidev vitidev 2,829 one ten 18 · Accepted Answer · 2016-04-20T15:52:53

Because the site really returns an empty string. See the sniffer - there is Content-Length: 0.

So, he does not like the request and needs more different headers so that you will not be mistaken for a bot. For example like this

  WebClient webClient = new WebClient(); webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"); webClient.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); webClient.Headers.Add("Accept-Language", "ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3");

already returning content.

c # Get html code by ur URL. Https get request

1 answer 1

More articles: