I want to parse the page and pull out some information from there. The problem is that if you repeat this procedure several times (not in a cycle several hundred times per minute, but simply 5-6 times within a few minutes), instead of this page, another one begins to be parsed, with the message: "Your IP address has been received Unusually many requests. The system of protection from robots decided that from this IP requests are sent automatically, and limited access. "
And from the browser, I still calmly can go to this site and follow the links at least a hundred times in a row, no blockages jump out.
Here is the simplified code:
Document doc = Jsoup.connect("https://www.kinopoisk.ru/film/328/") .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36") .referrer("https://www.kinopoisk.ru") .get(); System.out.println(doc); System.out.println("https://www.kinopoisk.ru/film/328/"); I deliberately output the result to the console in order to see which html code is issued. Because, if you simply click on a link, it always returns a normal page, a page with a message about the restriction appears only when parsing.
In general, there is no serious problem in this, the restriction drops in an hour or two and you can again access the page.
I just want to understand how it all works. Why the restriction appears only when using Jsoup.connect (), and if you just follow the link, it always returns a normal page.