How to build a notification system in a distributed project?

Question

I know that for notifications from the server to the client, you need to use WebSockets or Long pooling or something similar. But how to notify one user about a new message when a completely different user sent him a message. And, another user can be on another server or even in another data center.
Let's say the script:
User Peter opened WebSocket connection to the server in the European data center.
User George opened a WebSocket server connection in the American data center.
Petya sends a new message to George, and George doesn’t receive it with a small delay.
How to implement it correctly?
I have a few ideas, but I'm not sure which one to choose. Perhaps this is done altogether differently. Where to dig?

Idea # 1
When any user opens a connection, we write information into the distributed database. Type: user_id → server_address. server_address is the address of the server where it opened the WebSocket. Since the information is not great, all this information can be kept in RAM. Suppose you can make Redis or Memcached a cluster with replications and twemproxy that will work with the cluster. Or use a ready-made cluster from Redis. You can also keep additional information there at the same time. Suppose the user is online or offline.

When Petya sends a message to George, we will go to Redis, watch the ip of George’s server and directly send this message to George’s server. George’s server will receive the message and send it to George via WebSocket.

In this case, you can make a cache of caches. It is logical that one users will be connected to one server who will send messages to their friends and to someone else. When sending messages, it will be possible to cache the addresses of those users, and in case of failure, to invalidate the cache. But this is premature optimization, I guess.

Idea number 2
I still do not understand exactly how the data will be distributed and processed. But, there you can create subscribers and producers and put filters that will filter messages and send to the right producers. I did not understand because of the filters. It is not clear how they will check it all and how to write them correctly.
As I understand this option is better if you understand it, because you can also store data in RAM and send messages to the server you need directly from Apache Ignite (probably, but not a fact). At the same time, the number of server API connections will be much less. In the first case, we will have N * (N-1) connections, and in the second, N, where N is the number of servers.

Idea number 3
Same as idea # 1, but instead of sending a message directly, we will send it to the queue. Each data center will have its own local RabbitMQ cluster. Moreover, each server will create a turn in the RabbitMQ cluster and subscribe to it. When we want to send a message from Petit to George, we will look at what date the IP belongs to the center. If local, depending on the server, send a message to the appropriate queue. If in another data center, then we will send a message to that data center and there any server will be able to send this message to the local RabbitMQ cluster. In this case, we will have N * 2 + N * (M-1) connections where N is the number of servers, M is the number of data centers.

How to create this notification system? Perhaps all my ideas are wrong and there are more elaborate schemes.

via websocket you can connect to any server, and you can connect to multiple servers at the same time.
therefore, it is easier to assign one server for messaging and work through it
This is yes, but it will not work in a distributed project where many users are scattered around the world.
When performing slow tasks the server will not have time to cope with the load.
If we create a thread for each user for each task, then we will spend a lot of time switching threads during processor time sharing, and the server will just hang.
If we create a pool of threads, let's say, 200-400 threads, then after the pool is clogged, our users will begin to wait for the threads to be released.
Since users will come often, and only 200-400 flows, then after a certain time our stack will overflow and all subsequent commands will start to fall off.

How to build a notification system in a distributed project?

0

More articles: