I registered in .htaccess my own site error pages, for the two most common ones so far:

ErrorDocument 403 "/error/403/" ErrorDocument 404 "/error/404/" 

And I found that Apache simply loads the specified files "as is", that is, not only shows them to the user, but also gives them to scripts, for example, if the path to the style sheet is incorrectly specified, with code 200. Fortunately, the code is not long to change:

 header("HTTP/1.1 403 Forbidden"); 

and

 header("HTTP/1.1 404 Not Found"); 

respectively. But what about the body of the page? Obviously, site scripts and third-party programs - search robots, RSS aggregators, applications connected to the site's API, etc. - it is enough to get the error code, but is it possible to somehow give them only it, and the user who opens the page in the browser is the entire document? And what is the best way to distinguish between automatic and manual hits?

    1 answer 1

    And I found that Apache simply loads the specified files "as is", that is, not only shows them to the user, but also gives them to scripts, for example, if the path to the style sheet is incorrectly specified, with code 200

    If I understand correctly, you are saying that if a non-working link to any asset (scripts, style sheets) is indicated, then your server is trying to send an HTML document. The most obvious solution to the problem is to simply ensure that the links are always valid. Often one style sheet and one JS file is enough for a whole site. And if you use templating, then it will be enough to specify the path to assets in only one file and not be afraid that something will break somewhere.

    But in any case it makes sense to use routing and MVC. Begin to use these two techniques - and similar problems will be solved by themselves. No one uses .htaccess for error handling.

    It seems to me that you are not using frameworks in almost 2017, where similar problems are solved out of the box. I advise you to read my answers, where I covered similar topics and explained why you should use a single point of entry into the application, MVC, routing, and possibly frameworks:

    Obviously, site scripts and third-party programs - search robots, RSS aggregators, applications connected to the site's API, etc. - it is enough to get the error code, but is it possible to somehow give them only it, and the user who opens the page in the browser is the entire document?

    There is no point in this idea. If the search engine receives an unambiguous response code (for example, Forbidden or Not Found), then it will not care what the body of the page looks like. After some time, it will simply exclude it from indexing.

    You should act like this:

    • API requests - to give a JSON object with an error and a description of the error (for example, "API endpoint not found").
    • All the rest - some beautiful message that the page was not found.

    In all cases, a valid HTTP code should be given: 403 or 404.

    And what is the best way to distinguish between automatic and manual hits?

    By user agents. Each robot has its own title.

    • I use MVC, and in general I try to repeat the structure of a Rails project when developing network applications, albeit slightly modified, with security, too, everything is OK: only the API and the index file are accessible from the network. But I forgot about routing by request methods, thanks for reminding me. I do not use frameworks, as I prefer the application to do what I need, rather than what the framework developer considers it necessary - fussing with Rails and their endless version changes, without backward compatibility, was enough. - Risto
    • @Risto, well, unlike CMS, frameworks do not drive you into some kind of framework and do not limit anything, especially for professional frameworks (Laravel, for example). When developing a good project, in any case you will strive to repeat the functionality that most frameworks will provide out of the box, only they will have this functionality richer, more efficient (covered with tests) and documented. There are no really weighty reasons to refuse them. All experienced developers use them, and you use them. - neluzhin