SHA-1 is cryptographically weakened. Mercurial needs to switch to a strong hash function.

Goals

Possible Goals

TODO

Non-Goals

Goals Not Yet Classified

Selection of a Hash Algorithm

Mostly TODO. Blake2b at 30 or 31 bytes currently has the inside track.

Storage / Requirements Changes

A new repository requirement will need to be created to specify support for non-SHA-1 hashes.

There may need to be a repository requirement to specify the *primary* hash for new commits.

Revlogs already support 32 bytes for hash storage but only use 20 bytes for SHA-1. Assuming we use the existing revlog for storage, we'll reserve 1 or 2 bytes in the hash field to record the hash type then use the remaining bytes for hash storage. This allows multiple hash formats to be stored in the hash entry.

Future: in next revlog design, hash field should be variable width per revlog. This will allow using full 32 byte hashes and allow >32 byte hashes in the future. The revlog/store will need to be rewritten/upgraded to support wider hashes. But this one-time operation is acceptable because hash transitions should be rare.

Future: consider something like https://github.com/multiformats/multihash for declaring which hash is used. This will likely require a new revlog with >32 bytes for hash storage.

Wire Protocol Transition

Capabilities negotiation will need to exchange hash information and support.

Servers that have transitioned to a new hash will need to reject clients not supporting that hash and tell them to upgrade. The rejection should ideally be fast. This may be difficult in some cases because clients don't expose their features until bundle request time. We may have to error during discovery when SHA-1 hashes are used to request data stored under <HGHASH>.

TODO audit wire protocol and figure out how to do this.


CategoryNewFeatures CategoryDeveloper