Good day!

There is some data processing application. To him comes the client / clients and begin to actively send data. On average, 5-10k messages per second from one client should be processed on one node. The average message size is 700-1100 bytes. Encountered problems with socket buffer overflows, increased. Anyway, the speed is not always enough. above 3k the server starts to sink and eat 100% (1 core).

Question: how best to organize the architecture and who faced with such things? where to look? to read?

PS naturally server and client linux.

  • 2
    > above 3k, the server starts to sink and eat 100% (1 core) Have you already parallelized the load on all cores? Or do you only have 1 core running? - mega
  • one
    By message, do you understand the portion of the data added to the socket? Those. in fact, just stream up to 10,000 * 1,100 bytes / s from one client? And up to 3000 packets per second is the processor load essentially less? - Michael M
  • Yes, it is a portion of the data in the socket. At 3000 packs / s, 100% is packed. The question is how best to unload / parallelize this task. because we not only read from this socket, we also follow it and respond right there. - Fe1iX
  • one
    an indiscreet question - and with a profiler, even the simplest one, looked where it was stupid? can tupit just in the most simple? My favorite example is adding one element at a time to an array with permanent memory allocation. - KoVadim
  • Watched. VTune really helped. Yet the most laborious of all this is parsing and decoding / encoding messages. - Fe1iX

1 answer 1

If there is power - the typical number of simultaneously connected clients is significantly less than the number of cores, then you can do this, for example.

The thread running the socket reads the data and puts it in the task queue. Several other threads are selected from this queue, process this data and put the results in the results queue. The first thread takes everything from the results queue and sends it to the client. All this must be implemented carefully, otherwise there will be problems with locks, memory access conflicts ... Notifications about the appearance of new data - semaphores. If it slows down because of them, it can be confused with manual implementation through variables (it will be difficult and dreary, but I heard about a similar success story from a colleague).

Or, as a fantasy, you can do tasks in turn for each thread-handler. Each thread working with a socket knows 4 handler threads (4 result from 10,000/3000), and in turn puts them tasks.

But for a start, I would recommend thinking about optimizing the actual processing process. It would be much more pleasant to push all the data processing of one client onto one core.

  • So I thought about the same. I still have to endure in 2 turns. There is still such a moment: incoming messages are 2 times more than outgoing + require decoding. Nevertheless, I also tend to 2m queues and data processing flows already behind them. - Fe1iX