Comprehensive form-based website authentication guide

Question

Form-based authentication for websites

We believe that Stack Overflow should be not only a resource for very narrow technical issues, but also contain general recommendations on how to solve many common problems. "Form-based authentication for websites" should be a good topic for such an experiment.

It should include the following topics:

How to log in
How to stay logged in
How to store a password
Use of secret questions
How to organize username / password recovery
Openid
Tick "Remember me"
Autofill username / password
Secret URLs (public URLs protected by digest authentication)
Password strength check
and much more about form-based website authentication ...

It should not include such as:

Roles and authorization
The Basics of HTTP Authentication

Please help us

Offering subtopics
By sending good related articles
Complementing the official response

You can also see the original question on Stack Overflow.

Inspired by this post meta.ru.stackoverflow.com/questions/70/…

Community spirit ♦ one · Accepted Answer · 2016-04-20T20:46:34

Part one: how to log in

We assume that you are already familiar with how to build a form like "login + password", which sends a POST request to the server side. The section below will describe authentication patterns for practical use and how to avoid the most common pitfalls.

HTTPS or not HTTPS, that is the question

Only if your connection is no longer secure (which means it is not tunnelled over HTTPS using SSL / TLS), the data from your login form will be transferred in plain text, which allows anyone on the line between your browser and server to eavesdrop on this data at the moment transfer. If you are transmitting any non-trivial data, use HTTPS.

In essence, the only real way to protect yourself from intercepting / analyzing packets during authorization is to use HTTPS or another certification-based scheme (for example, TLS ), or proven and tested call-response schemes (like the Diffie-Hellman protocol ). Any other scheme can be easily bypassed by the attacker.

Of course, you can go the other way and use some two-step authentication system (including Google Authenticator, code book or RSA key generator). If you use them correctly, then everything should work even without a secure connection, but it’s hard to imagine a developer who will use two-stage authentication, but does not use SSL.

(Un) use your own cryptographic javascript solutions

Taking into account the non-zero cost and tangible complexity of installing an SSL certificate on your website, some developers are attempting to create their own intra-browser cryptographic solutions to avoid sending input data explicitly over an unprotected channel.

Despite all the beauty of the idea, it is essentially useless (and even dangerous to security), unless this solution is combined with the above schemes, namely: either protecting the line with strong encryption, or using a time-tested call-response mechanism ( if you don’t know what it is, just know that this is one of the most difficult concepts in terms of implementing and implementing digital security concepts).

Captcha is the enemy of man

CAPTCHA is designed to interfere with the execution of one specific category of attacks: dictionary selection (brute force) without human intervention. Without a doubt, this is a real threat, however there are more elegant ways of dealing with them without using captcha, but we will talk about them later.

Be aware that captcha implementations are different; they are often unsolvable by man, many of them are unstable against bots, absolutely all are useless against cheap labor from third world countries (according to OWASP , the price is approximately $ 12 for 500 attempts), and some of them may be illegal in certain countries (see OWASP Guide To Authentication ). If you still need to use captcha, use reCAPTCHA , since it is by definition difficult in optical recognition (already unrecognized scans from books are used) and tries its best to be user-friendly.

Personally, I find captcha irritable. I use them only as a last resort, when the user has exceeded all conceivable boundaries of unsuccessful attempts to enter a login / password. It occurs with an acceptable rarity, and increases safety in general.

Storing / checking passwords

It is likely that after all the massive hacks and data leaks to the network that we have seen in recent years, it is no secret to anyone, but I repeat: do not store passwords in your database unencrypted. User databases are hacked, leaked, or stolen by SQL injection, and if you store them in plain, unencrypted text, this is an instant game over your entire security system.

But if you cannot store the password, then how can you verify that the "login + password" combination is correct? Answer: to hash using the key generation function . Whenever a new user is created or his password is changed, you take the password and run it through FFK, such as bcrypt, scrypt, or PBKDF2, changing the usual password text ("Fuckin 'You have broken666") into a long, chaotic-looking string that is much safer to store in your database. To check the password for authenticity when you try to enter, you take the password entered by the user and also run it through FFK, this time inserting the salt , and compare the resulting hash with what is stored in the database. bcrypt and scrypt store the hashed password already with salt. Read this article (in English) on sec.stackexchange if you want to dig deeper.

The reason we use salt is that hashing alone is not enough - it is necessary to protect the hash from the rainbow table . Salt prevents two identical passwords from being assigned to the same hash, which saves you from scanning the database in one run if the attacker tries to find the password.

You should not use a cryptographic hash to store passwords, since user-selected passwords are not strong enough (that is, they usually do not contain enough entropy) and password selection can be performed in a relatively short period of time by attackers with access to the hashes. That is why FFK is used. They significantly lengthen the key , that is, every guessing of the password that the attacker performs causes him to iterate through the hashing algorithm several times (say 10,000 times), which makes the attack 10,000 times slower.

Session data - "You are logged in as KyCoK_3a6oPa_123"

As soon as the server compared the username and password with the database records and found a match, the system needs a way to remember that the browser was authorized. This fact should be stored solely on the server side.

If you are not familiar with how session data works: one randomly generated string is saved as a cookie with an expiration date, then it is used as a link to a data set - session data that is stored on the server. If you are using the MVC framework - do not worry, this has undoubtedly already been taken care of.

If possible, make sure the session cookies have a security flag and the HttpOnly flag before sending them to the server. The HttpOnly flag gives a bit of protection against cross-site scripting. The security flag ensures that cookies will only be sent using HTTPS, which will protect you from packet sniffing. The value of the cookie should be unpredictable. If a cookie begins to refer to a session that does not exist, its value should be changed immediately, which will prevent session fixation .

Part two: how to stay logged in or the notorious "Remember me" button

Persistent cookies (responsible for the "remember me" function) are at risk. On the one hand, they are completely and completely safe, like regular logins, in case the user understands how to handle them; on the other hand, they can pose a great security risk for inexperienced users who can use them on computers in public places and then forget to log out or not know what cookies are and how to delete them.

Personally, I like the saved logins on websites that I regularly visit, but I know how to handle them. If you are sure that your users are also up to date, you can use saved logins with a clear conscience. If not - well, then we can philosophically conclude that users who are frivolous about the confidentiality of their credentials and have been hacked are themselves to blame. Of course, we also do not go through the homes of users and do not disrupt all this disgrace in the form of leaves attached to the monitors with logins and passwords.

Of course, some systems can not afford to hack a single account. For them, do not use saved logins.

If you still decide to use cookies to save logins, this is what you should do:

First, take some time to read the article from the Paragon Initiative on this topic. You need to understand a bunch of different things, and she will explain each one perfectly.
Just a reminder of one of the most common mistakes. DO NOT KEEP COOKY LOGIN (TOKEN) IN YOUR DATABASE, KEEP ONLY THEIR HASH. Saved login token is equivalent to a password, so if hackers' hands reach for your database, they can use token, just as if they had a combination of login and password, so use login token when saving. Therefore, use hashing (according to https://security.stackexchange.com/a/63438/5002, a weak hash serves this purpose perfectly) when storing login tokens.

Part Three: Using Secret Issues

Do not use "secret questions". "Secret Questions" - antipattern security. Read the document on the fourth link from the MANDATORY FOR READING section. You can ask Sarah Palin about when her Yahoo! was hacked during the last election campaign because her secret answer was "Wasilla High School"!

Even if users enter a secret question themselves, it is likely that they will choose:

Question from the "standard set", as the girl's mother's name or the name of your beloved pet
Simple information that anyone can get from their personal blog, LinkedIn profile or the like.
Any question, the answer to which brute force is easier to get than a password. Namely: any question you can imagine.

Conclusion: secret questions are essentially not secure in virtually all of their forms and manifestations and should not be used for authentication for any reason.

The real reason why secret questions in general exist is that they so conveniently save money on calls to those. support with a request to send a reactivation code. This is the price of the safety and reputation of Sarah Palin. Worth it? I'm sure not.

Part Four: Forgotten Password Recovery

I have already explained why you should not use secret questions to recover forgotten / lost user passwords; Also, of course, you should not send users their passwords by e-mail. There are at least two common tricks to avoid this.

Do not change a forgotten password to a reliable auto-generated one — such passwords are difficult to remember, which means that the user will either change it or write it on a yellow sticker and paste it at the bottom of the monitor. So instead of setting a new password, let the users choose what they want to do.
Always hash the password / token in the database. Again, this is another example of a password equivalent, so it MUST be hashed in case a hacker gets to your base. When the lost password is required, send the password in the form of simple text to the user's email, then hash it and save it in the database - and then get rid of the original.

One final note: be sure that your interface for entering a “forgotten password” is as safe as your login form. The hacker simply uses it instead of accessing it. Make sure that you generate a very long “forgotten password” (for example, 16 characters of different case) is good for a start, but do not forget to add the protection schemes used in the login form.

Part Five: Password strength check

First, you need to read this small article to find out the real state of things: the 500 most common passwords (in English)

Okay, maybe this is not the most canonical list of the most common passwords in all systems everywhere for all the time , but it shows us well how unreasonably users choose a password if they are not required. Plus, this sheet is painfully similar to the sheet of passwords obtained as a result of analyzing recently stolen accounts.

So: without the minimum password length requirements, 2% of users use one of the top 20 most popular passwords. Conclusion: if a hacker has a dictionary of only 20 passwords, every 50th account on your web site will be hacked.

This can be avoided by setting a minimum password entropy threshold on the site. The special publication of the National Institute of Standards and Technology contains a set of very good proposals. So, by combining the dictionary and keyboard layout analysis, you can discard 99% of all unreliable passwords at the level of 18-bit entropy. Simply calculating the strength of the password and displaying a visual indicator of the strength of the password is a good practice, but not sufficient. If the user is not forced, he will most likely ignore the recommendations.

We recommend to watch this comic to discard the stereotype about difficult memorizing passwords with high entropy.

Part Six: Increasingly - or Preventing Hasty Entry Attempts

First, let's take a look at the numbers.

If there is no time to get acquainted with the data of the article, here is a brief information:

Almost instantly, you can crack a weak password, even if you hack it with an account
It is almost instantly possible to crack an alphanumeric 9-character password if it is not case sensitive.
Almost instantly, you can crack a complex character alphanumeric, upper and lower case, if it consists of less than 8 characters (a regular computer can pick up a password consisting of 7 characters in a matter of days or even hours)
However, you have to spend a lot of time to crack even a six-digit password if you have a limit of one attempt per second.

So what do we understand from these numbers? Okay, a lot, but let's highlight the main thing: it’s not too difficult to defend yourself from fast endless attempts to log in (the so-called brute force). But everything is not as simple as it seems.

In short, there are three options for effective brute-force protection (and dictionary attacks, but given that you already use a strong password policy, this is no longer a problem):

Having a captcha after the Nth number of unsuccessful attempts (hellishly furious and often ineffective - but I repeat)
Account blocking and the requirement to confirm your email address after the Nth number of failed attempts (DoS attack just waiting for this to happen)
And finally, login protection - that is, setting a time interval between attempts to enter a password after N unsuccessful attempts (yes, DoS attacks are still possible, but they will occur much less often and will be easier to cope with them)

Best practice # 1 : short periods of time, increasing with an increase in the number of failed login attempts, for example:

1 failed attempt - no delay
2 failed attempts - 4 seconds
3 failed attempts - 8 seconds
4 failed attempts - 16 seconds
and so on

For DoS attacks, such a system is very inconvenient, since each subsequent blocking time interval is more than all previous ones taken together.

We specify. Delay is not a delay before the browser response is returned. This is more like a timeout or refactor period during which attempts to log in to any account or from a specific IP address are not accepted at all. That is, the correct credentials or whatever they are, will not return, as I understand it after a successful login. and invalid credentials will not increase latency.

Best Practice # 2 : Medium Delay, which occurs after N unsuccessful attempts, for example:

1-4 failed attempts - no delay
5 attempts - 15-30 minutes delay

For DoS, hacking into this scheme is difficult, but certainly doable. It is also worth noting that this delay annoys authorized users. especially forgetful ones will hate you.

Best practice # 3 : combining these two approaches, either a fixed short delay time that occurs after the N number of unsuccessful attempts, for example: 1–4 unsuccessful attempts — no delay 5+ –20 seconds delay or an increase in the delay time with an increase in the number of unsuccessful login attempts eg:

1 failed attempt - 5 seconds
2 failed attempts - 15 seconds
3+ unsuccessful attempts - 45 seconds

This last scheme was borrowed from the OWASP best offer list (link 1 from "REQUIRED FOR READING") and should be considered as best practice.

According to the rule of thumb, I would say that the stronger your password, the less you will have to torment users with delays. If you require strong (case-sensitive characters + the requirement for letters and numbers) 9+ characters in the password, you can give users 2-4 attempts to enter the password before activating the lock.

For DoS, hacking such a system would be very difficult. And finally, always allow saved (cookies) logins (and / or a form protected by captcha) to log in, so authorized users will not have delays while the attack continues . Thus, for a DoS, a very difficult task turns into an extremely difficult one.

In addition, it makes sense to make more advanced protection for the administrator's account, since they are the most attractive for hackers.

Part Seven: Distributed Brute Force Attack

At the same time, we note that more advanced attackers will try to circumvent the login delay with the help of "activity distribution":

Distribute a botnet attempt to avoid IP marking
Instead of taking one user and trying on it the 50,000 most common passwords (which they cannot because of delays), they will take the most common password and try it for 50,000 users. Thus, they will not come close to the limit of unsuccessful login / password entries and their chances will increase, since the most popular password is much easier to meet than the password number 49 995.
Divide requests to each login with an interval of 30 seconds in order not to fall on the radar.

So the best practice will be to take into account the total number of failed login attempts throughout the system and use their average number as a limit for each individual user.

Too abstract? Let me rephrase:

Let's say your site has an average of 120 failed entries per day for the last 3 months. Taking this period as a basis, you can multiply the global limit by 3 - i.e. 360 failures per day. Now, if this threshold value is exceeded (or better monitor the change in the average), activate the all-system login delay for ALL users (again, excluding login with cookies and input with captcha).

Also check out the question on SO, which includes more information on how to circumvent the pitfalls in the fight against distributed brute force.

Part Eight: Two-Factor Authentication and Authentication Providers

Credentials can be compromised either by exploits, or by their recording and loss of media, laptop theft, or they can be entered on a phishing site. Logins can be additionally protected by two-factor authentication, which uses other methods, such as one-time codes received via a telephone call, SMS message, application, or electronic key. Some providers offer a two-factor system.

Authentication can be entrusted to a single-sign-on service, where other providers collect credentials. This sends the problem to a trusted third party. Google, Yandex and Twitter provide a standard set of SSo services, and FaceBook provides a similar patented solution.

REQUIRED TO READ ABOUT WEB Authentication (in English)

Nominated an award to attract more attention and advantages)