The problem is as follows. There is a database with approximately 50 billion unsigned bigint lines. For further work with them, they need to be clustered according to the condition of a given Hamming distance (i.e. the number of clusters is not known in advance). And, of course, no RAM is enough to process this volume at a time. Accordingly, the algorithm should be incremental, with progressive addition of new data. Actually, the question is: is there such an algorithm in nature? It is desirable, implemented in some common language (because pure math, I do not understand). In vskikdu, google failed: /
The task is not hypothetical, the project is commercial :) Thank you in advance.