I try to choose the most suitable queue server for this task: There is an algorithm that runs for a certain interval, you need to split this interval and make calculations to workers (the number of which is desirable to set dynamically depending on the size of the interval), then wait until all the workers are done and perform return values.

I read that a gearman is basically suitable for this, but I would also like to hear your opinions. I also read about the popular rabbitMq , but I didn’t give it the opportunity to ask several workers .

The programming language, in principle, is not important, the main thing is that it supports multi-threading (multi-core).

  • This is reminiscent of MapReduce. - andreycha

2 answers 2

RabbitMQ will allow you to connect several consumer to one queue (see http://www.rabbitmq.com/tutorials/tutorial-two-python.html Fair Dispatch). Regarding the dynamic consumer - here your application should decide when to start them and how to turn them off. You can run the count of the summers equal to the number of cores on the machine and then collect the data.

And your task resembles the implementation of MapReduce. Maybe you should look at Hadoop?

If you still want to write it yourself, then regarding the language, you can choose Erlang / OTP (RabbitMQ is written on it). Working with threads there is built very well and it is easy to combine several machines into a cluster, with the sharing of threads. Your application will not think about where and what is being performed, but simply wait for the completion of all the workers and return the result. But you have to break the brain before you write a functional code :)

    From what has already been advised - RabbitMQ. If the load is small and you do not need to guarantee delivery, drill it. In the case of Rabbit, accept the fact that messages can be lost, accept the fact that at high loads the cluster will start to fail, accept the fact that there will be problems when sharing a network.

    You can still take the Apache Kafka. It’s better with it, but it’s very difficult to tune it so that it works well.

    Hadoop or Spark is suitable for your task.

    If you are hosted on Amazon, then with their lambdas it seems like this can be done.