Recently, I began to be interested in the security of created applications, I was interested in several aspects related to data filtering in PHP and secure authentication / authorization. In Google I find a lot of different information, but it is not enough to draw conclusions. Perhaps someone will tell or direct the true path, so to speak.

Main issues:

  • Do I need to completely filter all input data, incl. global arrays $_SERVER , $_REQUEST , $_GET , $_POST , $_COOKIE , even if they are not entered into the database. What general points should I consider?
  • The better to use filter_var() , filter_input() , etc., or use regular expressions. Or when it is better to use one instead of another.
  • What method of authorization on the site can be considered safe?
  • Using PDO, can I not be afraid to bind variables right away (I never did that, just wondering how safe such actions are) bindValue(':param', $_POST['value']);
  • If I have an HTML (wysiwyg) editor, then I need to use the functions htmlspecialchars($var, ENT_QUOTES, 'UTF-8'); before saving to the database htmlspecialchars($var, ENT_QUOTES, 'UTF-8'); and htmlspecialchars_decode($var, ENT_QUOTES); .

What I have now:

Authorization on the site is as follows: The user enters a username / password. There is a request to the server and trying to get data (id, password, unique user hash) by the specified login. If there is one, then the password is checked using the password_hash function password_hash($password, PASSWORD_DEFAULT); , and if successful, cookies are created, and a new hash for the user:

$user_hash = md5( md5( time() + time() * rand(2, 10) ));
SessionModel::setCookie('_auth', md5($user_id), AUTH_TIMEOUT); SessionModel::setCookie('_token', $user_hash, AUTH_TIMEOUT);

So far, I have not yet figured out where and how to use this hash wisely to verify the user's identity. Most likely, the security here and does not smell, in this and ask for advice.

Data Filtering:

About two weeks ago I completely switched to OOP and started using PDO, before that I used mysqli to connect, respectively, to clear the incoming data, I wrote my functions, like:

 function clear($var) { $link = mysqli_connect(HOST, USER, PASSWORD, DB) or die( mysqli_error($link)); $var = strip_tags($var); $var = htmlspecialchars($var); $var = mysqli_real_escape_string($link, strip_tags($var)); mysqli_close($link); return $var; } 

Now I do not use filtering for incoming data at all, except for html code, for this I use

$encoded = htmlspecialchars($var, ENT_QUOTES, 'UTF-8');

$decoded = htmlspecialchars_decode(htmlspecialchars_decode($var, ENT_QUOTES), ENT_QUOTES);

The last one is repeated once again, that for the first time for some reason it didn’t normally display the decoded entities, I don’t know why, but it worked by accident in this way. I accept the rest of the data as follows:

$title = $_POST['title']; - sometimes I use trim() to remove spaces :))

In general , I understand that it is unlikely that I will receive a detailed answer to each of the questions here, but I would be very grateful, even for the current article with answers or answers to such questions. I've been learning PHP for about 1.5-2 years, and I don't know the answers to the simplest questions (or not simple ones). In Google it is difficult to find such a thing, as practice has shown.

And I will be glad to general recommendations :) Thank you.

    4 answers 4

    Do I need to completely filter all input data

    You must not trust any data received from outside. For example, $ _GET, $ _POST (by default, the two are $ _REQUEST, according to the php.ini settings, request_order and variables_order can also include cookies, $ _SERVER and environment variables), $ _COOKIE, $ _FILES, data downloaded from third-party systems ( for example, by API). The general point is that you should not look for an abstract filter from dangerous data, but understand what data you expect to find in this place and what happens next with this data. Output to CSV, HTML, or writing to a DBMS - each requires its own special processing.

    The better to use filter_var (), filter_input (), etc., or use regular expressions.

    Anything that will allow you to verify the correctness of the data. It is necessary to begin with the white list. Often you know in advance that, for example, $ _GET ['index'] you can only have foo or only bar . Check these two valid values.

    For example, for an email user there is a regular regular filter_var , hidden in filter_var . This is a good starting point and will usually work well. “Usually” - because email is a really funny thing. If you read the relevant RFC, it turns out that it is easier to check for the content of the @ symbol and send the same letter than to understand all the variety of acceptable options. Almost everything is permissible there.

    For example, login, you may wish to restrict input to only Latin letters and some special characters. This is most easily done by a regular program.

    The broadest interpretation, usually for free text input. For example, here for this very message. As a rule, any UTF8 characters are allowed.

    By the way, once I started talking about this: please do not validate the password in any way, except perhaps for the minimum length. And only if that is definitely required by the subject area, then by minimal complexity. But in any case, do not limit the maximum. You still have to hash it, and not to store, let the user enter what he likes and the length that he likes.

    What method of authorization on the site can be considered safe?

    Depending on security requirements. EDS is quite difficult to get around (figuratively, banking). It is difficult to get around if authorization is allowed only from one specific IP of one specific VPN (corporate data). For a site that is not as sensitive to security - HTTPS (if the server side is configured correctly! Over the past years, it has become rather easy to configure HTTPS incorrectly) to adequately cover MitM and encrypt the data.

    You can hash the initial password on the client and transfer the hash to the server so that the initial password is not transmitted over the network at all.

    Without HTTPS? Make HTTPS, the times of expensive certificates are already in the past.

    Using PDO, can I not be afraid to bind variables right away

    There is no SQL injection in this case. And immediately an important caveat: only if you have the connection encoding correctly configured or the emulation of the prepared expressions is disabled. https://stackoverflow.com/questions/134099/are-pdo-prepared-statements-sufficient-to-prevent-sql-injection

    But you still have to check logical errors. For example, do you think using (int) $ _POST ['amount'] is safe as: amount?

     UPDATE users SET balance = balance - :amount WHERE id=:user 

    (example, in reality there will be a double accounting entry in such a place, which check is additionally validated elementarily at the record level in subd (especially if the same mysql could check at all), but as one DBA says, people understand money faster).

    And if you pass -100? Get a charge of money instead of writing off?

    If I have an HTML (wysiwyg) editor, then I need to use functions before saving to the database

    Very interesting question and behavior depends on the degree of trust. Do you trust the one who uses this editor? Those. should the output be real HTML and need to be output as HTML? This is a common thing for admin some CMS. Then you should not validate this field at all. htmlspecialchars($var, ENT_QUOTES, 'UTF-8') should be called for this text when inserted into the textarea, otherwise a random text will break everything.

    If you do not trust, but there will be HTML - then you are obliged to thoroughly parse into tokens and check all transferred HTML on the white list. I will not tell you about specific tools, I only know that there are such. The problem is that, for example, you want to give the opportunity to insert <img src> , and you will slip some <img src='...' onload="alert(document.cookie)"> and that's it. Instead of an innocuous alert, there may be something more interesting. And htmlspecialchars is impossible, otherwise the picture will not be either.

    If HTML should not be in general, then htmlspecialchars. It is possible to apply before writing to the database, but logically it is more appropriate to apply directly when outputting to HTML. But not strip_tags. Why are you deleting what the user entered? You must keep it right and show it right, not delete it.

    If there is one, then the password is checked using the password_hash function ($ password, PASSWORD_DEFAULT);

    Is this a bug in the question? password_hash does not check anything. Checks password_verify.

    Why are you saying something, oh, how far CSPRNG is not writing, apparently, in a cookie, and how you plan to use it later - I also do not imagine.

    CSPRNG is a cryptographically secure pseudo-random number generator.

    For session session authorization and use. Let me remind you only about one obvious pitfall, which does not always pay attention: the session does not have a lifetime. Absolutely not. There is only the amount of time from the last access to this session, after which this session can be deleted by the garbage collector. And when the garbage collector starts up, who knows? And all this time, the session is still valid. Therefore, if for your task it is necessary to invalidate the authorization an hour after authorization or after the last user access, you should do this logic yourself.

    For long-term authorization, in my opinion, this answer is already huge. Better a separate issue.

    Data Filtering:

    See the beginning of the answer. You need to know what you want to find in this data and where this data will go next. The rest does not belong to safety, only crutches and illusions of safety. There is no “do me correctly and safely” magic function.

    And, of course, you cannot be sure that such information has come to you at all. First check for isset or, if valid for values, empty. Or filter_input, it will also correctly respond to missing keys.

    And once again it was reminded about CSRF: remember that everything that changes the state of the system should be done through POST, PUT, PATCH or DELETE requests (if it is not about the API, then only POST is usually used) and be covered with a unique token. Unique in general or unique to the user or for the session - the question is already debatable. GET requests should be read only. Two identical GET requests must return an identical result. Sometimes it is necessary to deviate from this rule, for example, for a “unsubscribe” link in letters (changes subscription data), but this is an exception. Do not delete anything via a GET request.

      Do I need to completely filter all input data, incl. global arrays $ _SERVER, $ _REQUEST, $ _GET, $ _POST, $ _COOKIE, even if they are not entered into the database. What general points should I consider?

      The main thing that needs to be clarified in web security is that anything can come in the request . Therefore, filtering - depending on the task, a lot of talk here is useless - there are whole books about it. There are many filtering tools in PHP frameworks: for example, you can cut all the scripts and on * attributes from the incoming HTML. But it is often more effective and readable to cast $id = isset($_REQUEST['id']) ? (int)$_REQUEST['id'] : 0; $id = isset($_REQUEST['id']) ? (int)$_REQUEST['id'] : 0; . It is not necessary to filter everything in the cycle in advance; this cannot be done universally (someone will need quotes in the text, someone needs whole HTML, someone binary data), but filtering everything in the controller that the controller needs is a standard approach.

      Using PDO, can I not be afraid to bind variables right away

      Of course it is possible. If properly used. Another responder pointed to a vulnerability , but this is not a vulnerability - but a crooked use: neither the keys nor the values ​​of the HTTP request arrays are inserted directly into the SQL query string , just because there can be anything in them. Better yet, use the SQL query builder from the framework (it can use an adapter to choose from - PDO, mysqli, doctrineDBAL, ...) - more convenient, more beautiful, read - I am sure you will like this article according to the old version, but the article itself is better written ( than the article on the new version ) and in Russian, the version is almost the same.

      The better to use filter_var (), filter_input (), etc., or use regular expressions. Or when it is better to use one instead of another.

      It is more convenient to use classes from frameworks - for example, Zend \ InputFilter or yii \ base \ Model . Well, much nicer, I don’t know how yii - and in zend it’s not necessary to pull the whole framework, you can pull only one Zend \ InputFilter component and use it.

      What method of authorization on the site can be considered safe?

      The main thing through HTTPS. And another note due to the popularity of md5 - tokens that are used during authorization should not be generated only with the help of md5, otherwise it is not safe .

      If I have an HTML (wysiwyg) editor, then I need to use functions before saving to the database

      I think it is not necessary, if you then output HTML on the site. BUT if the user can post to the site HTML - from it before writing to the database: the scripts, on * attributes, and other unsafe content must be cleaned. Even before saving HTML to the database, it would be cool to straighten out broken HTML, close tags .

      PS And yes - I'm a fan of frameworks, they save a lot of time and nerves.

        Do I need to completely filter all input data, incl. global arrays

        $ _REQUEST fake this https://ru.wikipedia.org/wiki/Inter_site_in request

        $ _GET, $ _POST - they can easily be faked, there is no need to explain

        $ _SERVER - Part of the values ​​of this array is populated from the received http headers. As you understand, in some headers you can specially send anything. In particular, everything in the $ SERVER array that starts with HTTP * can be faked.

        $ _COOKIE on the server can not be faked as far as I remember without hacking the server. PHP creates a cookie with a random value - the session identifier, and the file corresponding to this cookie. You can fake (or steal) only this cookie.

        even if they are not entered into the database

        It is difficult to say, it depends on the degree of criticality, for example, I changed my HTTP_REFERER and the system redirected me to the wrong place. If everything is done correctly, I will go to 404 page and nothing will happen.

        The better to use filter_var (), filter_input ()

        For all numeric values ​​purely intval () only 0, do not forget to check. For text, I prefer regulars.

        Using PDO

        In PDO, only prepared requests are safe and then and through them they were able to make https://phpdelusions.net/pdo/sql_injection_example so that htmlspecialchars and addslashes seem to be with us

        I need to use the htmlspecialchars functions before saving to the database

        Yes, because you can see where the editor sends the data and send there a direct request with incorrect data.

        Now I do not use filtering for incoming data at all, except for html code, for this I use

        Prepared expressions should be used. This is the main strength of PDO:

         // Подготавливаем запрос $b=$pdo->prepare(" INSERT INTO `table` SET uid=:uid, uri=:uri "); // Биндим параметры $b->bindParam(":uid", $uid); $b->bindParam(":uri", $link); // Выполняем $b->execute(); 

        So far, I have not yet figured out where and how to use this hash wisely

        Read on Habré https://habrahabr.ru/post/184220/ https://habrahabr.ru/post/194972/ and don’t try to swallow everything at once, if it’s still difficult then do the old-fashioned way, just sha hashing sha256 + salt.

        PS Since we got into this topic for all forms, attach protection against CSRF attacks using the token https://habrahabr.ru/post/235247/ the token should be updated each time the page is updated.

        • This is not an injection, it is a banal ignorance when used - etki

        Security is a very complex topic and only an integrated approach will help here, which should be developed in accordance with the security requirements of the project data. Although there are some general principles.

        Data

        Data obtained from the user can not be trusted. Filter and validate the data should be on a specific request, depending on the data requirements, there is no universal solution.

        For validation, use the library with a github, for example respect / validation , do not write a bicycle 500 times.

        Secure Login

        If you put a cookie, bind it to the IP address, the user-agent of the user, the cookie should not contain data in the clear and be httpOnly so that it is not dragged off, and if dragged away, then the attacker will have it invalid. Use https for more secure data transfer.

        Only safe is your DB server where the cookie will be stored? :)

        PDO

        Prepared requests in PDO protect you from injections, but the input data for validity must still be validated.

        HTML validation

        To validate incoming HTML, it is enough to create a white list of tags available to the user and clear the extra ones. An example of a library that allows you to set available tags, and even allowed attributes to each tag.

        • Dear anonymous haters, when you minus one, write the reason. - Firepro