The number of instances is calculated based on the required load by the tank: for example, at 5000 rps, the server is on average responsible for 250ms. This means that the tank must have at least 1250 instances to provide the required load.
The logic is as follows: with an average request time of 250ms, one instance per one second can send and receive an answer for 4 requests. We divide the required 5000 rps into 4 requests processed by one instance, we get 1250. If the server responds on average for 3 seconds, then it is obvious that you need to have more than 15000 instances: in the first second the tank will have the first 5k instances that will send requests and will be busy waiting for a response, in the second second busy will be the next 5k and so on. At the 4th second of the test, some of the first 5k instances that can be taken up again will be released.
You can reduce the amount of memory consumed by specifying the phantom to allocate less memory for reading the response with the phantom_http_* options ( https://yandextank.readthedocs.org/en/latest/configuration.html?#options ). At the same time, it is necessary that the answer fits into this restriction, otherwise, instead of correctly terminating sessions, the tank will send RST prematurely closing the connection.
It may be useful here, if not already specified, the header in the [Accept-Encoding: gzip, deflate] requests to reduce the size of the response.