The most efficient way to search arrays?

Question

There are 2 masivas (very large). The sought value (only one) is in one of them. Need some good search algorithm.

I would not like to first iterate through one array completely, then, if the desired value is not there, run the second iteration.

How effective would it be to run a search on 2 arrays at the same time in terms of performance?

The goal is to find the best way (the least resource-intensive and fastest).

What search algorithms will be effective in this situation?

Is it possible to do something with these arrays in advance?
javascript is single-threaded, and you need to really try to run the search in two arrays at once .
@hedgehogues The search happens multiple times. You can sort

hedgehogues hedgehogues 2.501 13 38 · Accepted Answer · 2017-03-15T13:07:10

To solve your problem, you can suggest several methods: binary search by polynomial hash, prefix tree.

Polynomial hashes. A hash function is a mapping that assigns a number to an object. In our case, the string. Polynomial hashes are a polynomial. Generally speaking, you can take any function. But in this case it may turn out that for two elements (lines) we have two identical objects. Polynomial functions minimize this risk. They are considered so:

f(строка) = строка[0] + строка[1] ^ 2 + строка[2] ^ 3 + ... + строка[n- 1]^(n-1),

where string [i] is the character code.

As you can see, there are many calculations with degrees. It is not good. If speed is needed, then quick exponentiation can be applied. In addition, it turns out that large degrees are large numbers, lower speed and more memory, so a polynomial hash can be taken modulo:

[f(строка)]_k , where k is a large prime number, and operation [.] is the taking of the remainder of the division. More here , here and here .

I note that taking the remainder of the division can be carried out separately from each term:

 f(строка) = [строка[0]] + [строка[1]] ^ 2 + [строка[2]] ^ 3 + ... + [строка[n- 1]]^(n-1)

Thus, counting the hashes, arrange the lines by it. Then we will do sorting by hash. In the sorted array, you can search for the desired element by binary search. Then the running time is O (max (log N, log M)), where M, N are the string lengths.

Prefix Tree This tree contains all string prefixes. Then we can search the tree for O (max (maximum string length)).

Prefix Tree (Trie)

Hash (as a structure) without collision resolution options is a non-usable thing in principle.
Key hashes that are computed as a polynomial hash are too slow.
The number of collisions will be small relative to the total size.
I'm talking about a hash table, and you, it turns out, about an array of "hash function + value" pairs, which then need to be sorted?

PinkTux PinkTux one · Answer 2 · 2017-03-15T13:01:57

To search for strings, the fastest structure is the ternary tree. Any other options will certainly be worse, and it makes no sense to discuss them in this context.

Implementations for JS are, for example, tritium . From examples from there:

 tritium = require('tritium'); tree = tritium.ternarySearchTree(); tree.add('airplane'); tree.add('airport'); tree.add('airside'); tree.add('apple'); tree.has('air'); // true tree.has('apple'); // true tree.has('apples'); // false tree.prefixSearch('airp'); // [ 'airplane', 'airport' ] tree.prefixSearch('air'); // [ 'airplane', 'airport', 'airside' ] tree.prefixSearch('be'); // []

Something seems to me that it will be longer than the prefix one
@hedgehogues, something seems to me both of you to think too much.
Judging by the commentary, he just needs to check the specific value without variations, that is, it makes no sense to make a garden with prefixes and trees
@Grundy, in the condition there is not only a search, but also a quick search on a very large array of strings.
Nothing better than a ternary (ok, generally prefix) tree has not yet been invented for this.
And then, why did you think that it is not suitable for finding a specific value?
@PinkTux, I mean, there is no point in manually doing this when there is a regular object that you can quickly check for the presence of a string key.

The most efficient way to search arrays?

2 answers 2

More articles: