Calculation should be made for a specific request, since different requests are executed at different times. You also need to understand that the calculations are very approximate and the actual figures may depend on many factors (for example, file system fragmentation).
In addition, we mean that the usual model of reverse requests to Tomcat is the model of workers (one thread per connection). Based on this, the task is reduced to calculating the size of the pool of Tomcat server workers.
Because you have two cores, then the guaranteed number of simultaneously processed requests without an increase in processing time is 2, if there is still hyper threading, then 4.
But we are not interested in exactly the simultaneously executed requests, since in fact, they are performed with some time relative to each other. Yes, the OS Task Manager allocates a certain time slice to each thread, after which it gives the CPU to another thread, and so on (constantly switching between threads). In addition, your threads basically do not spend CPU time, but expect I / O, i.e. in fact, they do nothing (although they occupy the core).
So there is another metric needed, namely throughput, i.e. how many requests per second can be processed by one worker
Let's assume that our reference query is executed during Treq ms (this is latency, well, almost latency). This means that one worker can process about 1000 / Treq requests per second. We multiply this by the number of cores (we also consider HT as the core) and get the perfect throughput of our server (that is, without a performance drawdown). In your case, 2 * (1000 / Treq).
Now we recall that the task manager constantly switches kernels between threads, and that threads often hang while waiting for I / O operations. This allows us to relax a bit and apply a certain multiplier factor to our formula, let's call it k.
In the ideal and most conservative case, k = 1. Often use the value of k = 2, and the most courageous bring it to 8 or even 16. Although the most courageous do not count anything and leave the default value (for Tomcat these are 200 workers). But you need to understand that the more streams running in the system, the higher the contention (struggle for resources) between the streams. This inevitably affects the increase in latency. Here again there is a need for compromise. You can sacrifice latency, if its value remains within acceptable limits in our opinion (for example, it fits into SLA). For example, the request is processed in 50 ms, and for some reason we are satisfied with 200 ms, which means you can increase the number of workers in the system and provide a greater number of "simultaneous" connections.
Well, after conducting a series of experiments, we decided that the server would allocate 500 workers, because we are bold. But under load, we began to experience strange problems, namely, latency constantly goes beyond the limits indicated in the SLA, while the processor is only 10% loaded. Most likely we rested on the performance of the disk (especially if it is HDD, you can’t argue against the mechanics). It is important that we could rest on a disk on another machine (the DBMS is usually installed on a separate server). Here it is impossible to say exactly what measures need to be taken, because There are many of them, from caching data in the application, to sharding the DBMS or completely changing the storage.
It is also possible to rest on the performance of the components, which we access from our application, for example, over the network.
Keep in mind two things:
- The calculation method described above is valid for the synchronous type of interaction.
- The loyalty of this technique is subjective and is not the ultimate truth, since built on my personal experience.