Optimal protection is blocking parsers and the ability to copy information from the site.

The second is an easy-to-solve task, but how can you determine that the parser / bot came to the site? When entering the site, the user leaves some information about the browser and his IP, so the first thing you can do is to prohibit the entrance to the site if the browser has not transferred the keys (or what is the name that the browser sends when connecting to the server?), But the overwhelming Parser works through the browser.

Actually the question is - how can this problem be solved in theory?

  • 2
    Do not you think that the question itself is immoral? Information must be free (if a visitor can read it with his eyes, then it cannot be forbidden to copy it). The question of honesty quoting, of course, is not canceled. - avp
  • one
    Is immoral? The question is about blocking parsers and bots - this is not correct. If the user copies everything with his hands - the flag in his hands, but using “malicious” programs is immoral, and not trying to protect against them. - cuthalir
  • one
    If the behavior of the program is immoral (the same for the user), then defend (including actively) correctly. The question is whether copying information is malicious. - avp
  • 2
    Naturally harmful: the information is no longer perceived by a unique search engine, a lot of left-side links appear in search queries, different from the original, which reduces the performance of the original source. Harm much and more; ( - cuthalir
  • one
    In my opinion, it is immoral to steal someone else's work, and to defend against theft is NOT immoral, this is normal. And then someone's freedom? What, I forbid someone to say on the Internet what he wants? No, I spend my time, I write articles, and someone just steals them and puts them on his website, and in 90% of cases he doesn’t even provide links, but he gives them as his own. - Ozim

4 answers 4

The parser is no different from the browser. The only thing you can check is the speed of navigation through the pages, and this can be bypassed ...

actually the answer is - 100% bot cannot be determined at all.

all judgments are based on certain heuristics and assumptions.

  • Okay. After visiting the site, I convey any information - right? And the parser \ bot? Not? Although, I'm stupid ... After all, the parser runs through the browser, basically, and connects to the site using the same protocol and transmits all the same data ... Can it really save its unique information in any way? - cuthalir
  • one
    Well, you can then search your information in Google .... - Alex Kapustin
  • one
    Normally curl and wget do not pass information about UserAgent. - Sergey
  • one
    Nothing prevents the creator of the bot to specify any headers as they are displayed by real browsers and it will not be possible to distinguish them. - cy6erGn0m
  • @ Sergey curl works via the http \ https protocol, so it’s not possible to understand what connects to the site - the bot, or "anon". The fact of the matter is that the parsers mainly work through curl and therefore it’s not possible to track them = ( - cuthalir

Protection of information only from the user, if you can do js. But the page code itself can not be protected. Although if some bot is trying to lick your page, you can try to first issue the skeleton of the page, and only then load the content on ajax, but you can forget about search engines) ajax is not indexed yet.

  • Already indexed. Google like and Yandex. Edge ear heard. - FoxManiac

You can translate text information into a graphic view. Those. display as picture. Pros - no one can stupidly copy the text. And if you podshamanit with watermark, then the OCR may not help. On the other hand, as a user, I simply would not visit such a site, because comfort is important to me. And in the case of displaying text in the form of a picture it is not - the text is neither copied nor enlarged, and everything will be displayed slowly. But this technique is possible if it is necessary to hide from the search engines and prohibit direct copy-paste any sensitive information. For example, mob. phone seller at a flea market.

    There is such an idea: to dim the screen with a script, for unreadable text, and the Ajax form to close it (close button with hidden element) if not clicked, moved to another page - this is a bot, track the transition session.

    • What about shutting down JS?) - Ozim
    • 3
      The authors of such shaders absolutely do not think about mobile users. You go to the site, but you cannot close such a window (they often center it and the button is inaccessible). - KoVadim
    • but the fact remains that users who have disabled JS will not be able to get to the site. - Ozim
    • Blackout can be enabled via JS, and for non-users you can ask to click on the button in <noscript> - Palladium Inhibitor