How can md5 work correctly?

Question

If you take any line and md5-point 36 ^ 32 times (the number of variants of md5-lines), then after that the results will start to repeat. And it turns out that for example md5 from "hello" coincides with md5 of another word. What am I wrong?

@And I'm not trying to do anything, I just proved that md5 is not unique to any string
@Yaant Well, I just didn’t think that there was such an obvious drawback in md5

Accepted Answer · 2018-05-13T08:29:48

MD5, like all other hashes, is designed to quickly assess that the string is likely to be the same. At the same time store a small amount of information (16 bytes). The only proof of the exact equality of strings can only be a comparison of the original strings or their images after unambiguous coding, such as compression by archiving algorithms. But infinitely compress a string, without losing information is not possible.

When you need to evaluate, with acceptable accuracy, that before you is the same file, 1 gigabyte in size, without storing the file itself, hashes are the only way out. If you think that the probability of a collision in MD5 is too high for your task, use another hash or several at once. But of course you only reduce the likelihood of error.

And to apply md5 several times to the result of another md5 does not make any sense. If md5 of two lines is initially equal, then performed many times will also be equal. At the same time, the opposite (concluding equality of objects according to equality many times repeated md5) with a very high probability works, but is no longer guaranteed, because there can be two 16-byte sequences that will give the same md5.

So it turns out does not make sense and use different hashes.
If the md5 of the two lines is the same, then the md5 (sha1 ()) of these lines is also the same.
But if there are two different lines ( X != Y ), in which md5(X)=md5(Y) , then the probability that sha1(X)=sha1(Y) extremely small, because the algorithm is completely different.
Then we get that if we store md5 from the source line and sha1 from the same source line (and not from md5), then the total collision probability is about 1 / 2^288

Zergatul Zergatul 9,136 one 13 26 · Answer 2 · 2018-05-13T00:57:01

As planned, the result of the hash function should not differ from the random number. So the cycle can begin through any number of iterations n , where 1 <= n <= 2^128 . It is possible that there is a string x such that md5(x) = x .

The fact that the hash function "loops" is nothing wrong with that, it is the expected behavior from the point of view of probability theory.

How can md5 work correctly?

2 answers 2

More articles: