If you take any line and md5-point 36 ^ 32 times (the number of variants of md5-lines), then after that the results will start to repeat. And it turns out that for example md5 from "hello" coincides with md5 of another word. What am I wrong?
2 answers
MD5, like all other hashes, is designed to quickly assess that the string is likely to be the same. At the same time store a small amount of information (16 bytes). The only proof of the exact equality of strings can only be a comparison of the original strings or their images after unambiguous coding, such as compression by archiving algorithms. But infinitely compress a string, without losing information is not possible.
When you need to evaluate, with acceptable accuracy, that before you is the same file, 1 gigabyte in size, without storing the file itself, hashes are the only way out. If you think that the probability of a collision in MD5 is too high for your task, use another hash or several at once. But of course you only reduce the likelihood of error.
And to apply md5 several times to the result of another md5 does not make any sense. If md5 of two lines is initially equal, then performed many times will also be equal. At the same time, the opposite (concluding equality of objects according to equality many times repeated md5) with a very high probability works, but is no longer guaranteed, because there can be two 16-byte sequences that will give the same md5.
- So it turns out does not make sense and use different hashes. If the md5 of the two lines is the same, then the md5 (sha1 ()) of these lines is also the same. - Miqayel
- @Miqayel of course. But if there are two different lines (
X != Y), in whichmd5(X)=md5(Y), then the probability thatsha1(X)=sha1(Y)extremely small, because the algorithm is completely different. Then we get that if we store md5 from the source line and sha1 from the same source line (and not from md5), then the total collision probability is about1 / 2^288- Mike - Cool!) And I did not think) Thank you! - Miqayel
As planned, the result of the hash function should not differ from the random number. So the cycle can begin through any number of iterations n , where 1 <= n <= 2^128 . It is possible that there is a string x such that md5(x) = x .
The fact that the hash function "loops" is nothing wrong with that, it is the expected behavior from the point of view of probability theory.
md5(md5(...))? What are you trying to do? Or do you think that it is safer? I advise you to read about the collision hash function - And