2 x 2.8 processor

RAM 512

30GB disk

Ubuntu os

Technologies used are Java (Spring), Tomcat, Postgresql.

I understand perfectly well that the question is not entirely correct, and it’s impossible to give an unequivocal answer, but I’m interested in at least an approximate alignment, which we can hope for.

And a few more clarifications, let's take, for example, the simplest service for storing notes (since my application is similar in functionality):

  • users add / edit delete notes, there are also sections, well, and the information on users.
  • Suppose the application is written correctly and does not particularly load the processor, and there are no memory leaks.
  • The same with DB, all queries are optimized and indexes are correctly entered.
  • Thanks to the offline mode, the user can change different data and then send the entire change snapshot to the server. Accordingly, you can configure the time after which the application is synchronized with the server, let's assume for a start that this is 1 request per minute.

Based on all the above, how many active (constantly making requests to the server) users can simultaneously receive a server with this configuration? Approximate fork values ​​even with a large degree of error.

It is also interesting how would you calculate the minimum configuration of the server if, for example, you knew in advance the technology, the number of users and other factors? Perhaps there are interesting articles on this subject or a book.

    2 answers 2

    Googling for a bit, you can find the article http://qnatech.wordpress.com/2010/03/05/how-much-traffic-can-my-web-application-handle/

    There they proceed from the fact that Tomkat has 200 workflows by default. And it seems to me that 0.2 seconds - this will be the normal processing time for a single request. (although with this amount of memory, I would increase to 1 second :) - because for java it is not a memory, it is sclerosis).

    Based on this, we get from 200 to 1000 requests per second.

    Since it is planned to make a request once a minute, it gives from 200 * 60 = 12000 to 60,000 users online.

    Since not all users are constantly using, so in fact it is possible to serve more users. It is usually assumed that 5-7% of users are constantly online. This is a 20-fold increase in the number. total from 240,000 to 1,200,000. But I think that you will end up on the disk.

    You can go on the other side. Let one note - kilobyte (without images). The average user will keep hundreds of notes (who is more, who is less). Total 100 kilobytes per user (we neglect his profile and overhead costs). 30 GB 0.1 MB - 300000 users (and apparently somewhere else you need to install java itself and store logs ...)

    Therefore, with the current hardware, I would not count on the number of users more than 50-100 thousand.

    • Thanks for the answer! The article is very useful, and answers a number of my questions. - Lookingfor

    Calculation should be made for a specific request, since different requests are executed at different times. You also need to understand that the calculations are very approximate and the actual figures may depend on many factors (for example, file system fragmentation).

    In addition, we mean that the usual model of reverse requests to Tomcat is the model of workers (one thread per connection). Based on this, the task is reduced to calculating the size of the pool of Tomcat server workers.

    Because you have two cores, then the guaranteed number of simultaneously processed requests without an increase in processing time is 2, if there is still hyper threading, then 4.

    But we are not interested in exactly the simultaneously executed requests, since in fact, they are performed with some time relative to each other. Yes, the OS Task Manager allocates a certain time slice to each thread, after which it gives the CPU to another thread, and so on (constantly switching between threads). In addition, your threads basically do not spend CPU time, but expect I / O, i.e. in fact, they do nothing (although they occupy the core).

    So there is another metric needed, namely throughput, i.e. how many requests per second can be processed by one worker

    Let's assume that our reference query is executed during Treq ms (this is latency, well, almost latency). This means that one worker can process about 1000 / Treq requests per second. We multiply this by the number of cores (we also consider HT as the core) and get the perfect throughput of our server (that is, without a performance drawdown). In your case, 2 * (1000 / Treq).

    Now we recall that the task manager constantly switches kernels between threads, and that threads often hang while waiting for I / O operations. This allows us to relax a bit and apply a certain multiplier factor to our formula, let's call it k.

    In the ideal and most conservative case, k = 1. Often use the value of k = 2, and the most courageous bring it to 8 or even 16. Although the most courageous do not count anything and leave the default value (for Tomcat these are 200 workers). But you need to understand that the more streams running in the system, the higher the contention (struggle for resources) between the streams. This inevitably affects the increase in latency. Here again there is a need for compromise. You can sacrifice latency, if its value remains within acceptable limits in our opinion (for example, it fits into SLA). For example, the request is processed in 50 ms, and for some reason we are satisfied with 200 ms, which means you can increase the number of workers in the system and provide a greater number of "simultaneous" connections.

    Well, after conducting a series of experiments, we decided that the server would allocate 500 workers, because we are bold. But under load, we began to experience strange problems, namely, latency constantly goes beyond the limits indicated in the SLA, while the processor is only 10% loaded. Most likely we rested on the performance of the disk (especially if it is HDD, you can’t argue against the mechanics). It is important that we could rest on a disk on another machine (the DBMS is usually installed on a separate server). Here it is impossible to say exactly what measures need to be taken, because There are many of them, from caching data in the application, to sharding the DBMS or completely changing the storage.

    It is also possible to rest on the performance of the components, which we access from our application, for example, over the network.

    Keep in mind two things:

    1. The calculation method described above is valid for the synchronous type of interaction.
    2. The loyalty of this technique is subjective and is not the ultimate truth, since built on my personal experience.
    • Thank you very much for your answer! Very accessible and understandable. As for the possibility of simultaneous processing of requests, I figured out thanks to the previous and your description. But I am still interested in the operational memory, which, too, in my case, can become a problem under load (since there is very little), what can you say about this? Figures, formulas, I think there can be some voiced here (everything depends on the project), but relying on your experience, I think you can estimate it. That is, how much will the amount of memory consumed increase depending on the increase in load? - Lookingfor
    • one
      @Lookingfor This question is easier to answer by analyzing the running application. Create a load similar to the expected one and analyze the size of the heap (for example, via JMX) and GC calls (by including gc log). As the load increases, JVM memory appetites will change. So you can roughly estimate the required amount of memory for normal operation and develop a strategy for optimizing an application from memory and / or a GC tuning strategy. It is worth noting that although in Java it is possible to trickle up quite well with memory optimization, additional memory purchase may turn out to be a cheaper solution in the future. - a_gura