Hello. Now I am engaged in the creation of a large site on php, and the question arose of how architecture is implemented in sites like VKontakte. As far as I know, VKontakte manages to work on dozens of servers. So, how to decompose the site into several servers?
Closed due to the fact that it is necessary to reformulate the question so that it was possible to give an objectively correct answer by the participants Ipatyev , user194374, VenZell , aleksandr barakin , Grundy Mar 13 '16 at 17:20 .
The question gives rise to endless debates and discussions based not on knowledge, but on opinions. To get an answer, rephrase your question so that it can be given an unambiguously correct answer, or delete the question altogether. If the question can be reformulated according to the rules set out in the certificate , edit it .
- And where did you read that? about the server with likes and pr? For likes answers 2-3 queries to the database. Does a whole server really need this? - Alexey Shimansky
- Maybe I'm confusing something, and they have a database distributed across servers? - Accami
- They have everything distributed. The question itself is valid, but you have to understand that if a person doesn’t have the skill to type the word "architecture VKontakte" in Google, but is going to write something like that, then the question arises about the degree of adequacy with which he assesses the reality surrounding him. - Ipatiev
- oneru.so question about social networks and performance , video libraries: highload conference , technomarket mail.ru (there are several highload videos), highload websites: architecture of various large sites / social networks , Web applications optimization and scaling + highload conference ( and like their "Thoughtful Optimization" ) - BOPOH
1 answer
Resources of any server are finite. Each connection to the client consumes memory, processor, part of the network channel. Therefore, if you have a huge number of users and connections, try to use several servers, distributing the load between them.
The first thing that comes to mind is to separate the static files (CSS, JavaScript), the application and the database. We place them to start on different servers.
Further, we increase the grouping of servers for static files, allocating a separate domain for them, you can either put a balancer in front of them, or assign each server its own subdomain, selecting them in a random order by the application. If the static load increases, you simply increase the number of servers in the statics pool.
Next, we scale the application server — we also increase their grouping. You can’t afford to store user’s media files on the same server; otherwise, if you upload an avatar to one server, you will have to distribute it to all servers in your pool. If there are hundreds of them, you will forget all the channels by transfer. Therefore, all media files are stored on separate servers, better in some dimensionless storage (Swift, S3, Ceph). In this case, on all servers with your application there will be only the code and their number can also be increased by leveling the load with the balancer.
Further the database is the most difficult place. Usually, any database has a replication mechanism, when you chain databases into a chain, writing data to a single server (master); by the replication mechanism, it is played on all slave servers (slave). There are two problems here: the information is reproduced with some delay on the one hand, on the other - so you scale only the read operation, the record does not scale. If you have a small number of users who write - you can still exist with one master, if there are millions of them - the entry should be scaled differently. There are several mechanisms: ring-type replication, when you ring several master servers, say 10 and every tenth id-nickname write to your own individual server, which is replicated to the other master servers by replication (I haven’t seen such a mechanism live — very hemorrhoids for many reasons). You can calculate the user’s hash, for example, by its name and distribute users to different servers, for example, you have 256 database servers, we calculate the user’s md5 hash, we take the first two characters of the hash - there will be 256 servers. All users with the appropriate hash can be served by a dedicated database server. Usually, however, try to choose a mechanism that distributes users more evenly.
Then they begin to allocate individual modules of the application into services — advertising has its own pool of servers, comments — its own, news — its own, counters — its own.
Only for earlier it is better not to do this, only if the load on the subsystem is visible: if you have 3 banners, you do not need a separate subsystem, if you do not need real-time counters, you do not need a separate system for them. Just spend time and effort. However, designing an application so that its database, application server, static and media files can be served by different servers, and better by server pools, is a good thing.