Suppose we write a REST endpoint to RoR, which allows the user to delete records. We expect the user to send requests for mass deletion of records. The system must have high performance and be created in a short time. What should you pay attention to when designing in the first place?

At one time, I wrote a similar interface to remove a user from a website. Following the user, all his data was cascaded off (an average of 1000 photos in 4 copies on AWS), which simply stopped the server for 30 seconds. The problem was solved by creating a crutch that created a separate stream for the delete operation. I would not want to get up again on the same rake.

  • creating a separate thread is not exactly a crutch. But who knows how much to delete this time decide. And most likely everything will rest on the base. - KoVadim

1 answer 1

On deferred performance of long actions

Once you have the opportunity to create a separate thread for this action, you do not need to complete this action before sending a response to the client. And for such actions it is customary to create a queue of tasks . As a canonical example of such actions, sending e-mail is usually given, but you have found an action of a slightly different nature, no less suitable.

With them, there is control over how many such heavy tasks can be executed in parallel, and you can adjust the number of queues so that the system does not fall. And what's more, you can change this number directly during work (this is more a task from the field of administration).

The structure of such queues is usually as follows:

  • There is a repository of tasks . It can be a specialized messaging server (RabbitMQ), or it can be a general-purpose data store (Redis, PostgreSQL). There are stored tasks that require execution.
  • There is a server (worker, workflow), which takes one record from the task store in turn (order can be imposed by the data store, but not necessarily) and does what is written.
  • There is a client that adds entries to the task repository .

From modern implementations from the Ruby ecosystem I can suggest the following:

  • ActiveJob , a kind of adapter for different queue implementations , appeared in Rails 4.2. Predictably, almost all implementation-specific functionality is buried deep or inaccessible.
  • Sidekiq is the most popular option. Tasks stores in Redis . Perform tasks run specifically for this process. Supports nishtyaki like assigning tasks for a given time.
    • Reliability, unfortunately, is so-so: the workers take the tasks from the storage and atomic with the fence overwrite them there so that one task cannot be grabbed by several workers. If the process that performed the task falls, the task will disappear into nowhere. The solution is in the paid version, but you understand.
  • Hutch is a little-known option. In fact, it is a simplified client for RabbitMQ , the messaging server. Simplified, since the possibilities are the absolute minimum and the exchange of messages occurs through a single topic exchange .
    • Topic exchange can be used not only for task queues. Therefore, Hutch is not positioned as a queue of tasks, but as a means of exchanging messages between services.
  • Que is a fairly new and interesting option. Tasks are stored in a separate PostgreSQL table. Can use all ACID guarantees along with the rest of the data and change in transactions with them.
    • The "capture" of tasks is done through an advisory lock (recommender lock) on which the database is not paying attention, the application can clearly indicate that it wants to somehow relate to them. The fact of blocking is fixed at the connection level in RAM. If the connection dies, the lock will fall off and the record will be available to others. There is a problem with this: it is impossible to start Que queers with access to the database via pgbouncer.

There are, of course, others. You can navigate through the list of support ActiveJob .