I will list all the user identification methods known to me.
IP address
I specify this method because it is the only one that cannot be faked. It can be borrowed from others (proxy, VPN, Tor, just dynamic IP), but this is usually more difficult than, for example, cleaning cookies. Delete the IP-address, like cleaning cookies, you can not: some must be. Due to its relative reliability (not everyone is not too lazy to have hundreds of proxy servers to change IP ready), it is often used to enhance security: for example, they limit the maximum number of requests per second / minute / hour from a single IP. However, different people sitting through one Internet will not allow IP to be distinguished, which contradicts the condition of the question, so we go further.
Banal login and password
The essence is simple: we stupidly send login and password in each request. One of the options for implementing this method is already present in the HTTP protocol itself, through the Authorization header, already implemented in all major web browsers and web servers.
In the HTTP version, the essence is as follows:
when you first visit the site, the client has nothing and does not send any additional information to the server. The server responds with a 401 Unauthorized error and adds an HTTP WWW-Authenticate header with information about login methods (for a simple login-password, this is Basic realm="default" )
the client gets it all and asks the user for a username and password. After that, it sends its request again, but with the HTTP header Authorization , which contains the base64 username and password: Basic YWRtaW46MTIzNDU2 . If we decode this example, we get admin:123456 - login and password, separated by a colon
the site checks all this and either responds normally, or again 401 and ask for the login password for a new one
This Authorization: Basic YWRtaW46MTIzNDU2 every time Authorization: Basic YWRtaW46MTIzNDU2 send in all subsequent requests.
Advantages:
- simplicity. The HTTP version in web browsers and web servers is already done, nothing needs to be invented. If you make your own version, it is sufficient to implement the verification of the login-password in each request without additional complications.
Problems:
Without HTTPS, there is no security at all: the login-password is in fact walking over the Internet in clear text. The client also has to remember in his password in the clear;
HTTP version in browsers works only within the current session; After restarting the browser, the login-password must be entered again.
For the sake of fairness, I note that HTTP can not only bare login-password ( perhaps a complete list of authorization options ), but I will not dwell on other methods due to their low prevalence.
Random string
The easiest, most balanced in terms of "safety / convenience" and the most popular method of identification. The most common in the world (probably) cook PHPSESSID - this is it. The bottom line is:
when you first visit the site, the client has nothing. The site notices this, creates a new random string (more authentic, so that it is difficult to pick up; 30 characters at least) and together with the usual response to the request in some way sends this generated string (Set-Cookie, redirect to a special link or just in the body of the response , if it is for example JSON API)
the client, along with the answer, receives this line and stores it somewhere (the browser itself stores in cookies, the SPA can put it in localStorage, etc.)
on subsequent visits to the site, the client adds this line to his request (cookies, HTTP header Authentication or just the GET parameter in the requested address)
if you need to identify the client more specifically (login by login-password, for example), the site in its database then writes that such and such a random line corresponds to such and such a login, and then on subsequent requests reads this information from the database.
If we talk about PHP, then all this is built into it: when you call the session_start() function, a PHPSESSID cookie is PHPSESSID from random letters and numbers (or the existing one is read, if it already exists). The data associated with this cookie is stored in the $_SESSION , and you can read and modify it. The contents of this array are saved to a file by default; upon subsequent requests from the user, this file will be automatically read when session_start() called, and all the data that you put into the $_SESSION when processing previous requests will be restored. Details in the documentation .
Advantages:
simplicity is obvious;
when changing the IP address (and this is a frequent occurrence on mobile phones), identification does not crash;
the implementation of the “Unlock me on all devices” button is reduced to simply deleting all records in the database.
Problems:
the random string generator should be really random (or not completely random, but crypto-resistant , not uniqid() ), since an attacker can try to pick a pseudo- uniqid() for example, selecting a generator state in PHP or Python , or selecting sessions created through uniqid (), Invision Power Board ). In no case can you use the login hash, password hash, current time, a single pre-prepared string and other non-random things as a string, as this greatly simplifies the selection. How to get a real chance, read the documentation for your programming language. Or just use a pre-built implementation like session_start() in PHP;
additional server load. To find out exactly which user is hiding behind a random string, he has to access the database. Not a problem for the vast majority of sites, but for giants such as Google is already a problem;
Cookies are sometimes buggy: for example, IE11 adds cookies to subdomains, even when it is not requested (Edge has already been fixed), which can lead to data leakage to third-party CDNs, for example. So watch how the browsers for which you hone the site are manipulated with cookies. Well, do not forget about HttpOnly so that it is impossible to hijack cookies through XSS (and about Secure if the site uses HTTPS).
Nonrandom but protected string (for example, JWT)
The bottom line is this: brazenly violating the aforementioned ban on non-random data and shoving a string, for example, a user ID and, optionally, existing access rights (for example, admin), the expiration date of the string and any other data. But! In addition to this line, we add some hash, which is considered to be data plus a certain secret line that only the site knows and does not give to anyone. When requested from the client, the site accordingly checks that the hash is correct. This protects against tampering and fakes: in order to fake data, you need to recalculate the hash, and the attacker, not knowing the secret string, cannot do this. (The secret line should be VERY long, one hundred characters, so as not to pick up at all, since all security is on it.) (In JWT, instead of just a secret line, you can use RSA to sign, which increases security, but I will not write all the implementation details and so long it turned out)
Advantages:
less server load. The client has already sent all the necessary data, the server can only calculate the hash from this data and the secret string and check that it matches the sent one. You don’t need to go to the database: the secret line usually lies in some variable nearby, so everything is done quickly;
the client himself can read JWT and understand who he is (if the data is only protected by a hash, and not encrypted);
when changing the IP address also does not crash.
Problems:
implementation is complicated. If you do everything yourself, then you can mess up and get a security hole, so it's best to take ready-made implementations like the same JWT;
The button “Unlocking me on all devices” cannot be done at all. In order for a user data line to become invalid, you need to either change the secret line, or remember somewhere in the database that such a line with such data became invalid. But all this is quite problematic and negates all the advantages of this method of identification. Therefore, such lines, as a rule, make short-lived: for example, Google issues JWT in its API, which is only valid for half an hour (information about the expiration date is stored directly in JWT, you don’t need to go to the database).
information may be rotten. For example, if you write to JWT that the user is the admin, and then select the admin rights, the site, based on the JWT data, will continue to consider the client as the admin until the JWT itself has gone completely. You can take information from the database, but then again it becomes easier to use a random string.
JWT and analogues due to the fact that they contain all the necessary information, usually long; with a large amount of data, the string may, for example, not get into cookies.
Supercuts and other fingerprinting
The point of using technology is not as intended. Each browser and each OS has its own behavioral features, and these features can be used to fairly accurately identify who is logged on. For example, they draw text a little differently, and by minor differences in pixels of text, browsers can be distinguished. I will not paint everything in detail, I will leave links for further reading:
Advantages:
- get the hell out. If you want, you can, of course, but so much hassle. This is no longer just a button "Clear cookies" click. The client device will be identified regardless of whether it changed the IP address, cleaned the cookie, etc.
Problems:
accuracy is not one hundred percent. All iPhones are pretty much the same, and it is unlikely to succeed in distinguishing one iPhone X from another iPhone X (although this only concerns fingerprinting, for a simpler super-phone);
users will find you and beat you painfully.