There are 2 collections.
Word collection and text collection (1 item = 1 pages).
I take the page, then break it into words and search through the distance of Levinstein in an attempt to find similar words on the page.
For a very long time ... How would it be parallelized? If I am not mistaken, then iterators are not thread-safe => as I understand it, in order to achieve parallelism, I will have to copy a collection of words on each stream?