I did a little experiment here ...
Randomly created vector of N numbers with positive and negative values, using different containers:
size_t usingSet(const std::vector<int>& input) { std::set<unsigned> unique; for(int item : input) { unique.insert(abs(item)); } return unique.size(); } size_t usingHash(const std::vector<int>& input) { std::unordered_set<unsigned> unique; unique.reserve(input.size()); for(int item : input) { unique.insert(abs(item)); } return unique.size(); } size_t usingVec(const std::vector<int>& input) { std::vector<unsigned> uniq; uniq.reserve(input.size()); for(int item : input) { uniq.push_back(abs(item)); } sort(uniq.begin(),uniq.end()); return unique(uniq.begin(),uniq.end()) - uniq.begin(); }
1000 times find out the number of unique elements. Time in microseconds, it is clear that plus or minus ... but for evaluation it is quite suitable. I did not begin to round, I think, it’s not difficult to estimate in my mind in seconds-milliseconds.
Set Hash Vector ----------------------------------------------------------- N = 10: 790 1256 253 N = 100: 8304 8243 1192 N = 1000: 130070 78354 27573 N = 10000: 1543452 757059 498093 N = 100000: 14545110 4232742 6182957 N = 1000000: 127603400 27039847 61045621
So, the vector starts to lose between 10 and 100 thousand hashes. At small values, the hash loses even set 'y, comparing at the level of 100 elements. Further set hopelessly losing to everyone - as I understand it, not only because of O(N*ln N) , but also because of the large number of memory reallocations (he has no reserve() member, unlike the hash and vector).
As a simpler test of consumed memory, I did not come up with something, but it is clear that here a vector will give everyone a head start :)
In a word, the saddest option is set , it doesn't make sense to use it at all. With small sizes - up to tens of thousands - it is better to take a vector, and then look at what is more important - speed or memory.
All this in Visual C ++ 2015, 32 bit. For Visual C ++ 2010, 32 bit advantages of the vector are seen even brighter :) And the hash here generally loses:
Set Hash Vector ----------------------------------------------------------- N = 10: 813 1459 223 N = 100: 8688 10795 1448 N = 1000: 135905 102716 29086 N = 10000: 1623786 1068878 519483 N = 100000: 18334735 8822607 6316194 N = 1000000: 179540082 83667899 61899536
Who wants to repeat for other compilers - you are welcome ...
By the way, with the vector you can still optimize - since we only need the number of unique elements - you can not use unique with its shuffle in memory, but just go through and recalculate those elements that have neighboring ones are not the same.
Update . I assumed that the number of matches is small; in the comments corrected me. I still did not make the number of unique elements scanty - the hand does not lie - but I filled the array like this:
for(int i = 0; i < N; ++i) src.push_back((rand()%(N/5))*(i%2 ? 1 : -1));
Those. There are a lot of matches, but still there are unique elements - O(N)
Results for 2015, a million did not wait:
Set Hash Vector ----------------------------------------------------------- N = 10: 279 760 229 N = 100: 2693 2785 1185 N = 1000: 53087 24940 26121 N = 10000: 806965 273205 472123 N = 100000: 12350458 3440321 6006227
The border has shifted somewhat down, but the principle remains: set flies, but hits the hash on arrays up to 100; the hash begins to lead somewhere from 1000, and not from 10,000.
In the limit, when all N elements are units, the results are almost paradoxical:
Set Hash Vector ----------------------------------------------------------- N = 10: 200 895 219 N = 100: 888 1840 493 N = 1000: 7676 10020 2778 N = 10000: 73524 102685 25881 N = 100000: 733058 1167238 259100 N = 1000000: 7396809 11645557 3561249
vector, and only vector! Hash in this case loses even set 'y.
Everything, there is no more time to do experiments; who wants to go on :)
Update2 - In his "Optimized C ++", Günterot wrote for nothing that the unordered_set although it has an effect, is not at all the one that should have been expected, reading as it is being praised :) Quote from the book: The hash table std::unordered_map std::map , but not the order of magnitude that suggests .
O(N*lnN), but by the real time of work, the vector sorting + uniq will be faster ... - Harry