What function is used to create a hash in std :: hash and how much more preferable is it than the usual crc32? Preferences are fewer collisions, faster calculation. Basis - 32-bit system. Excerpts from the standard as a response are not welcome.
3 answers
Basis - 32-bit system.
Do you understand that crc32 and the bitness of your system do not correlate in any way? crc32 was also used in MS-DOS 16-bit OSs. And 32 is just the size of the final hash in bits.
What function is used to create a hash in std :: hash
std :: hash is not a function, but a template, so for hashing various types it will use different hashing algorithms, for example, std::hash
for int 1
will return 1
, since the hash for a number is the number itself.
In addition, you did not say exactly which implementation of the standard C ++ library you are using. In theory, you can imagine a library that uses crc32 for all hashing (and it works very poorly).
crc32 is used for small data, but fast. md5 in slower, but much harder to run into a collision (32 bits <128 bits in md5)
Reply to comments:
I correlated crc32 with hash c ++ 11, and it, in turn, depends on the bit depth of the axis. And for a correct comparison, they should be the same bit depth
depends on the ISO standard, then on the compiler developer. how it implements hashing is its own business, if not specified in the standard, and yes, it is std :: hash that depends on the capacity of the system, you are right.
The actual hash functions are implementation-dependent, which means that both clang and msvc and gcc and everyone else can implement them differently. Did I ask about md5 and its speed? It is outdated. And where I need hashes with minimal collisions, I will not use it.
what a news. md5 is vulnerable to intentional attacks (you can create a collision specifically), so it is outdated as a cryptographic hash (thanks to @ D-side for editing), but as a function of hashing it is not. (good comparison md5 / sha1)
default function for hashing gcc strings - MurmurHashUnaligned2
and here is an absolutely perfect comparison of various hashing algorithms
- one@strangeqargo Regarding the edit ... Yes. The hash calculation algorithm is not standardized, but it should return size_t. And this number, the size of which depends on the bit value. So no matter what algorithm the compiler developer implements, the result will be a number. In my subjective opinion, if the results of the hash are numbers with one digit capacity, then the probability of a collision will be the same. - Max ZS
- one@strangeqargo But the processing speed will of course depend directly on the algorithm. And here, even for the same crc32, there may be differences. Well, for example, if he will pre-calculate the table, or it will be already full. - Max ZS
- one@strangeqargo Yes. I understand. For this and the question was raised. The truth is, there is one more thing. I would like to use the standard function, but it turns out that it is not “standardized,” as it were. Those. if, in fact, it is used both on clients and on the server, and over time on the server when it is recompiled on the updated axis, it will change, then there will be garbage. - user212277
- 2"outdated as a function for encryption" - I will help you a little with terms, this is called a "cryptographic hash" , it still does not have a relationship to encryption. - D-side
- oneI compared @strangeqargo on a byte array (it can be said that on strings) std :: hash and crc32. Most likely, you can't call my tests correct, but std :: hash really wins crc32 in speed. Collisions in general are zero for both. Only once crc32 produced two collisions with 100k lines of 2k characters in length. With smaller "volumes" of collisions, both algorithms do not. The shorter the string length, the smaller the difference in speed. If we compare words and phrases, then they will be equal in speed (which is logical, since most likely the main expenditure will be used for various kinds of initialization of variables). - user212277 1:51 pm
Obviously, the answer depends on: the type of data for which the hash is calculated, the version of the standard library used, bitness, etc. If you use libstdc ++, then for std :: string, we’ll get into the _Hash_bytes function from here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libscc% 2B% 2B/hash_bytes .cc
Obviously, this is something quite simple, chosen for performance reasons. No CRC. Hardly this hash has a name.
- 3The name is specified in the comments to the code: 'Based on the public domain MurmurHashUnaligned2, by Austin // Appleby. murmurhash.googlepages.com ' - αλεχολυτ am
- Yes, alexout is right, this is MurmurHash2. This followed from the link in my deleted comments. But the question remained open - what is better? I do not see the answer in the answer. - user212277
- MurmurHash2 - faster (works with blocks of 32 or 64 bits). СRC - easier implemented by hardware (in the context of c ++ is not relevant). That makes all the difference. - Chorkov
- @Chorkov embedded.com/design/programming-languages-and-tools/4438660/… good article on the topic - strangeqargo
Based on the comments, as I understand it, std::hash
for strings will use the MurmurHashUnaligned2
hashing MurmurHashUnaligned2
. For ordinary numbers as such, the algorithm will not be - the result will be the number itself. Therefore, in the case of numbers, std::hash
will certainly be faster than crc32
and in principle without collisions. But with strings, the situation is somewhat different. Therefore, I guess that the comparison between MurmurHashUnaligned2
and crc32
interesting. I think that based on the fact that the result of the calculations of both algorithms is the number of the same bit depth (as indicated in the question), then the probability of collisions will be approximately equal. In general, a good comparison of algorithms for speed and collisions for different types of data can be found here . murmur2
crc32
and murmur2
are present in the test.
- Thanks for the answer. Well, judging by the tests cited also by strangeqargo, it turns out that
murmurhash2
faster thancrc32
, but it loses by the number of collisions. It always seemed to me that the hashing algorithm was for some reason, and by the number of collisions it should have been better than the control :( - user212277 - priorities for the standard library, I think it was: speed / memory / collisions. I don’t know why the committee decided so, but, probably, they had ideas and, probably, you can even find reasons - strangeqargo
- one@strangeqargo well so here and do not go to the fortuneteller :) Clear stump that for the standard came from some of their own considerations. And judging by the tests, obviously the collisions were “not held in high esteem.” Rather, speed and memory. For crc32, only the generated table will eat a kilobyte. - Max ZS
более проверен
I did not mean the properties of the algorithms themselves. I meant exactly the use case itself. It is the fact that the function used in the standard will possibly change. And so it really will only show time. - user212277