We all know that std :: set is a red-ebony tree (respectively, with the complexity of the search O (log n)), and std :: unordered_set is a hash table with search for a constant.
What are the advantages of a set, apart from maintaining the order of elements? If order is not important to me, is it always better to choose a hash table?

  • The set a little less memory uses, and more reliably. Hash table is a probabilistic piece though. But in general - yes, the hash - the table is usually better. - pavel
  • one
    Well, for example, memory overruns. Perhaps not a suitable (or badly written for user data) hash function, so the data is poorly hashed. As always, you need to experiment - there is no panacea ... - Harry
  • From what I read, the unorderd_set implementation uses memory allocation in small chunks (as a linked list), and under heavy load it will fragment the memory. - KoVadim
  • @KoVadim is how? ... How then to get access for O (1) if the memory is in chunks - pavel

1 answer 1

Sometimes storage in the form of an ordered array is very important from an algorithm point of view. For example, find the value (int) closest to the specified one, if the values ​​are stored in unodered_set.

Sometimes, writing an hash function to an adequate task is impossible, for example, when storing a double.

Sometimes, the expected data set is small enough. O (1) is not always faster than O (Log (n)), you can only guarantee that there is such a size of the data set on which O (1) will be faster. In practice, there are often situations when the expected number of values ​​is 3-5 pieces.

It should also be remembered that the cost of a single hash calculation for the O (k) line (where k is the length of the string) is on average, and the cost of the comparison operator O (k) is in the worst case and O (log (n)) is on average If build random since the first characters. In this case, in the case of set only one “unfortunate” comparison will be made, when we find the desired result, but this expensive comparison will not have to be made in the case of unordered_set. If the expected rowset to search for, only with a very small probability contains the rows from set, then O (k) will be more expensive than O (log (n)).