SHA-1 is cryptographically weakened. Mercurial needs to switch to a strong hash function.
- New hash algorithm should be cryptographically secure.
- New hash algorithm should be fast, if possible (SHA-1 hashing is already a bottleneck in some operations).
- Mercurial should support N hash algorithms without requiring invasive changes to storage data structures, wire protocol communication is. (This is because whatever we replace SHA-1 with will presumably be broken in several years anyway and we shouldn't need to retool everything to roll out a new hash algorithm.)
- Transition plan will be up to repository owner, not a strict requirement for a specific version of Mercurial
- Repos and servers will be able to have a flag day where all new commits are a specific hash
Commit signing implications. Commit signing and cryptographic chain of custody is an independent (but related to repo security) topic. See CommitSigningPlan for more.
Goals Not Yet Classified
- Do we support a repo owner deciding to rehash to a new algorithm? If so, how do we allow old hashes to be used for lookups (i.e. links to hgweb to old hashes can't stop working)? Also, how do we mitigate downgrade attacks in this scenario?
Selection of a Hash Algorithm
Mostly TODO. Blake2b at 30 or 31 bytes currently has the inside track.
Storage / Requirements Changes
A new repository requirement will need to be created to specify support for non-SHA-1 hashes.
There may need to be a repository requirement to specify the *primary* hash for new commits.
Revlogs already support 32 bytes for hash storage but only use 20 bytes for SHA-1. Assuming we use the existing revlog for storage, we'll reserve 1 or 2 bytes in the hash field to record the hash type then use the remaining bytes for hash storage. This allows multiple hash formats to be stored in the hash entry.
Future: in next revlog design, hash field should be variable width per revlog. This will allow using full 32 byte hashes and allow >32 byte hashes in the future. The revlog/store will need to be rewritten/upgraded to support wider hashes. But this one-time operation is acceptable because hash transitions should be rare.
Future: consider something like https://github.com/multiformats/multihash for declaring which hash is used. This will likely require a new revlog with >32 bytes for hash storage.
Wire Protocol Transition
Capabilities negotiation will need to exchange hash information and support.
Servers that have transitioned to a new hash will need to reject clients not supporting that hash and tell them to upgrade. The rejection should ideally be fast. This may be difficult in some cases because clients don't expose their features until bundle request time. We may have to error during discovery when SHA-1 hashes are used to request data stored under <HGHASH>.
TODO audit wire protocol and figure out how to do this.