nodeJs sequelize how to get rid of duplicates when writing in parallel

Question

Hello!

the database has a translate table of the form id, language, text the id field is not auto-increment and is the same for translating one phrase in different languages there is a script on NodeJs using express, cluster (currently only 1 worker is working), sequelize in the cluster the Redis client is connected

script accepts sourceLang, tergetLang, phraseToTranslate as input

if this phrase is not in the database - then a request is made to googleTranslateApi of this phrase for all languages that exist in the database

after that, the maximum id in the cache and database is checked, incremented and again stored in the cache.

The reserved id is assigned to this phrase and then written to the database.

This is a problem for single test calls to the script, everything goes fine, but if you send a large number of requests to the script, the same id is often assigned to different phrases

As far as I understand, the server responds to several requests in parallel and thus receives the same maxId - how to avoid it?

there should be three id in this case - as there are only three languages

If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer).

Accepted Answer · 2016-11-22T19:29:14

TL; DR: Sequential surrogate number keys are not needed!

Let's first understand the common approach to localization.

Traditionally, localization uses a string key and one or more translations for this string. As a key, either a string in the main natural language (usually English) is usually used. This key string can contain zero or more patterns where real values are substituted. For example, in your system there may be such a localization constant:

 Hello %s!

Which may have the following translations:

 RU: Привет %s! ES: ¡Hola %s!

A typical translation storage scheme is:

 lang | key | trans -----+-----------+----------- RU | Hello %s! | Привет %s! ES | Hello %s! | ¡Hola %s!

Why am I telling all this? To the fact that you lose sight of one very important point: you do not need a surrogate key (your id field) to work with localization constants. Instead, it is more correct to use the natural key - the string to be translated. Additionally, it is necessary to enter a unique key at the database level (язык + переводимая строка) .

This will allow, on the one hand, to discard the meaningless numeric key from the database, and, on the other hand, to simplify the translation process and use natural language strings in the application code. For example, gettext uses this format:

 console.log(_('Hello %s!', 'John Doe'));

_{If I did not convince you, read on.}

The process of obtaining the elements of a numerical sequence in a distributed system is a very difficult task. Personally, I know only two acceptable ways to solve it:

Implement centralized service issuing sequence elements
Abandon the sequence and use the Globally Unique Identifiers (UUID) as identifiers.

I don’t want to describe the first way, since it is rather complicated in a fault-tolerant / scalable implementation (for example, you did not succeed). Instead, I’ll tell you more about the UUID.

The main idea of UUID is to use values with high uniqueness as identifiers. With this approach, you do not need a centralized service that generates ID records - clients themselves generate as many identifiers as they need, without fear of conflict. There are now 5 different UUID generation standards (all described in RFC4122 ). I would draw your attention to UUIDv4 - since it is he who is completely unique and does not depend on the environment in which the generation takes place.

As for node.js, in the npm ecosystem there is a uuid module that takes care of the uniqueness of the generated values.

Using this approach, the algorithm of your application can be:

The input to the method is a set of source language ( sourceLang ), array of target languages ( targetLang ), strings to be translated ( phraseToTranslate ).
UUID generated
The original string with the addition of a UUID written to the database.
The initial task is divided into several, for each of them the key is set from the phraseToTranslate + targetLang + UUID
As the translation tasks are completed, the translated phrases are written into the database. This uses the UUID of p.2.

If you use the "typical scheme" that you described - for example, to get a translation from sourceLang = ru, targetLang = fr, phraseToTranslate = text, it turns out I need to first get a "key" and then if I found it - search for trans - did I understand you correctly?
PS I also thought about the unique non-incremented id but thought that there is some more correct way to increment id without duplicates.
if I’m allowed to get rid of id in general, I’ll do it; if not, I’ll use the uuid module In any case, thank you so much.
@LeoMrakobes usually choose one natural language as their source.
For the example from the answer, this might be hello_message .
Actually these keys (+ language) are used to select translations.
But it all depends very much on how exactly you use these translations in the application.

nodeJs sequelize how to get rid of duplicates when writing in parallel

1 answer 1

More articles: