Work with a large collection

Question

In general, there is such a class:

class Metric { public Word Word1; public Word Word2; public Metric(Word word1, Word word2) { Word1 = word1; Word2 = word2; } public int Simil; public override int GetHashCode() { return Word1.GetHashCode() + Word2.GetHashCode(); } } public class Word { public string Text; public Word(string text) { Text = text; } public override int GetHashCode() { return Text.GetHashCode(); } }

The class contains data about the similarity of two words.

There is a class that contains a field with the type HashSet.

So, when the collection reaches a size of several million, it becomes noticeable that the process is not as fast as it was at the very beginning of the addition.

Is there any way to avoid this? In theory, LinkedList will be nimble, but then I lose checking for the uniqueness of the pairs of words being checked, and this I use to not recalculate the same thing again.

Igor Igor 78k five 20 48 · Answer 1 · 2018-10-10T21:17:02

Cache the GetHashCode result:

 class Metric { private int fHashCode; public Word Word1 { get; private set; } public Word Word2 { get; private set; } public Metric(Word word1, Word word2) { Word1 = word1; Word2 = word2; fHashCode = Word1.GetHashCode() + Word2.GetHashCode(); } public int Simil; public override int GetHashCode() { return fHashCode; } }

Igor

78k five 20 48

Hmm ... And it seemed to me that when an element is added to a HashSet, it calculates the hash itself and remembers it, so that later it does not count for all elements when added. Nevertheless, although the process went faster, but the slip began a little later now = ( - iluxa1810
@ iluxa1810 can matter in the algorithm for calculating the hashcode and the number of collisions? - 4per
The calculation algorithm can be said that the standard for strings is used with the only difference being that the hash codes of two strings are plus. We do not override the hash code for basic types => this should work. Collisions would not degrade performance since the element was simply not added. - iluxa1810
Collisions would not degrade performance since the element was simply not added. - why do you think so? ideone.com/XiIyKd - Andrey NOP

|

Work with a large collection

1 answer 1

More articles: