In order to use your own class as a key in unordered_map (for example) you must define a hash function. Cppreference.com example

 template<> struct hash<S> { typedef S argument_type; typedef std::size_t result_type; result_type operator()(argument_type const& s) const { result_type const h1 ( std::hash<std::string>()(s.first_name) ); result_type const h2 ( std::hash<std::string>()(s.last_name) ); return h1 ^ (h2 << 1); } }; 

this concerns two fields, I am confused by this moment h1 ^ (h2 << 1) I saw examples without shift and now I don’t quite understand how I project this example on a larger number of fields, according to which rule. Can I just do h1 ^ h2 ^ ... ^ h(n) ? or do each subsequent hash I need to shift? h1 ^ ( h2 << 1 ) ^ ... ^ ( h(n) << n ) ?

I would not like to encounter conflicts, they will be very difficult to detect, but they will spoil the work significantly.

  • one
    You will inevitably encounter collisions - ixSci

2 answers 2

In fact, it's best to try to “mix” hashcodes as much as possible. For example, Jon Skeet gives this example :

 result_type hash = (result_type)2166136261; hash = hash * 16777619 ^ std::hash<std::string>()(s.first_name); hash = hash * 16777619 ^ std::hash<std::string>()(s.second_name); hash = hash * 16777619 ^ std::hash<std::string>()(s.third_name); return hash; 

Code

 h1 ^ h2 ^ ... ^ h(n) 

It is considered incorrect, because quite often the fields have the same values ​​and have equal hashes (and for integer fields, the field itself is often used as a hash). Since during the operation ^ equal values ​​are mutually destroyed, then we get a smaller number of hash values ​​and, accordingly, a lot of collisions.

Also popular option is this:

 result_type hash = start; hash = hash * factor + std::hash<std::string>()(s.first_name); hash = hash * factor + std::hash<std::string>()(s.second_name); hash = hash * factor + std::hash<std::string>()(s.third_name); return hash; 

(it uses Java and .NET) for a suitable start and factor value. As a factor , a small prime number is usually used, like 13 or there 29. start should be in theory mutually simple with a factor (for example, another simple number, usually more).


Here is a good overview article (in English) on various hash calculation methods.

    Here's another option for the link using boost

     #include <boost/functional/hash.hpp> struct KeyHasher { std::size_t operator()(const Key& k) const { using boost::hash_value; using boost::hash_combine; // Start with a hash value of 0 . std::size_t seed = 0; // Modify 'seed' by XORing and bit-shifting in // one member of 'Key' after the other: hash_combine(seed,hash_value(k.first)); hash_combine(seed,hash_value(k.second)); hash_combine(seed,hash_value(k.third)); // Return the result. return seed; } };