Good day.

There is a simple task: there is a huge array of strings of 64 characters each, from time to time it is necessary to check whether there is an input string in it. The problem is complicated by the fact that storing this array somewhere is not possible. But there is an opportunity to count once, say, some hash of this array or something. Please tell me whether it is possible to create any "convolution" of this array, by which you can check whether there was any element in the array, by which this "convolution" was counted. This "convolution" must be much smaller than the original array. The data in the array is random, the source data need not be restored, any error is allowed <100%

Is there a solution to such problems?

  • CRC32 store. The error is exactly less than 100% :-) - Vladimir Martyanov
  • Thank you all for your promptness. Already looking, google) - Impatient iguanas

1 answer 1

I would recommend to see Bloom's filters - quickly, if it shows that there is no row - then it is not 100%, if there is, then you can estimate the probability of false positives.

The meaning is approximately the same - bit hash values ​​are simply summed up with the operation "or", and then the desired hash is checked using "and". If all its single bits are in place, it is possible that this value is available. If there is a mismatch, it is definitely not.

You just need to select a sufficient hash function :)

Here is another source .