Hash collision

Martin Geisler mg at lazybytes.net
Wed Apr 29 08:17:00 CDT 2009


Brian Mearns <bmearns at ieee.org> writes:

> I'm curious about the possibility for collisions in the hashes that
> mercurial uses to identify all of its revisions. I realize there are
> yet to be any reported collisions in sha-1, but isn't it conceivable?

It is a fact that there *must* be collisions for SHA-1 and all other
hash functions, for that matter. SHA-1 takes an arbitrary long string
and turns it into 20 bytes (160 bit). So if you hash all 160 bit
strings through SHA-1, and then hash any 161 bit string, you must find
a collision.

> Does mercurial do anything to detect or defend against collisions,
> or is it really so unlikely that it's not worth worrying about?

Don't worry about it. There are 2^160 different hash values, and this
is a truely astronomical number. If a collision is ever found it will
be a major scientific discovery.

> And what exactly would be the result if, say, two revisions in a
> repo ended up with the same hash? Would it be disastrous for the
> repository, or just a minor snag?

I'm not really sure what would happen.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.


More information about the Mercurial mailing list