Here is my code

public static void main(String[] args) throws IOException { Document doc = Jsoup.connect("https://elitedrop.ru/").userAgent("Mozilla").data("name", "jsoup").get(); System.out.println(doc); } 

After sending this request to the site, instead of its Html code, it returns this:

 <html> <head></head> <body> <script>var xmlhttp = new XMLHttpRequest();function eraseCookieFromAllPaths(name) { var pathBits = location.pathname.split("/");var pathCurrent = " path=";document.cookie = name + "=; expires=Thu, 01-Jan-1970 00:00:01 GMT;";for (var i = 0; i < pathBits.length; i++) {pathCurrent += ((pathCurrent.substr(-1) != "/") ? "/" : "") + pathBits[i];document.cookie = name + "=; expires=Thu, 01-Jan-1970 00:00:01 GMT;" + pathCurrent + ";";}}eraseCookieFromAllPaths("BHC");xmlhttp.onreadystatechange=function() { if (xmlhttp.readyState==4 && xmlhttp.status==200) { var a=xmlhttp.responseText;document.cookie="BHC="+a+"; path=/";document.location.href="/?name=jsoup"; } };xmlhttp.open("GET", "/banhammer/pid", true);xmlhttp.send();</script> </body> </html> 

Please explain what is the problem (code comes from other sites)?

    2 answers 2

    The site where you want to parse the information most likely filters requests by the User Agent field.

    To work with this site using Jsoup, you need to install a valid User Agent, for example:

     Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36 

    Code:

     public static void main(String[] args) throws IOException { Document doc = Jsoup.connect("https://elitedrop.ru/").userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36").data("name", "jsoup").get(); System.out.println(doc); } 

    Successfully displays the HTML code for the specified page on the console.

    • it still displays this - Vadim
    • @Vadim, Strange, my problem was solved by installing the user agent. Try to go through a proxy. - post_zeew
    • Nope, still displays the script that I indicated in my question - Vadim
    • can send the code or screen, please - Vadim
    • @Vadim, I wrote the code in the answer, but about the screenshot ... - why? - post_zeew

    Most likely on the side of the site is protection from robots / DDoS, which uses the fact that in the browser of a "live" user JavaScript will be executed.

    • Tell me please, how can you bypass this protection? - Vadim