Commit Signing Plan
- Problem Statement
- Background and Threat Models
- Desired Scenarios and Workflows
- GPG Extension and Its Limitations
- Proposed Solution: In-Commit Signing
- Concerns and Open Issues
- Comparison to Other VCS Systems
1. Problem Statement
Mercurial should provide stronger guarantees about the authenticity of commits, including who made them and optionally who "signed off" on them. It should do so in a manner that scales to large repos with high commit velocity and doesn't lock users in to one specific workflow. See also CommitCustodyConcept.
2. Background and Threat Models
2.1. Spoofable Author Field
Mercurial only has a single field for author information. It captures a name and email for the person or entity making the commit. However, any user can set any value for the author field. This opens up the possibility for spoofing.
A nefarious person could create a commit that appears to be coming from a well-known and trusted individual. This form of "social engineering attack" could result in a reviewer letting his or her guard down ("person X writes good code: I don't have to pay too much attention") and malicious code being inserted into a repository without deserved scrutiny.
The viability of this attack varies, as many workflows use tools with accounts and these tools commonly expose account info that could be used to reinforce the identity of the patch author/submitter.
2.2. Transport Level Patch Rewriting
Many projects use insecure transmission of patches (email) or third party hosting of commit data (e.g. pull requests). Even self-hosted Mercurial repositories are one exploit away from remote rewriting (the exploit could be in your OS or HTTP server).
A party with MITM capabilities, the ability to coerce a third party hosting provider (such as through a secret court order), or the ability to hack a server running a Mercurial server could alter the contents of commits between when the author wrote them and when a trusted party looks at them. This rewriting could introduce a vulnerability. This rewriting could potentially go unnoticed, as people tend to glance over things like exact SHA-1s, especially before they are published and no work is derived from them (so no divergence to notice).
Allowing patch authors to sign commits gives them confidence that tampering of their patches would be noticed.
2.3. Lack of Formal Sign-Off and Trust in Sign-Off
Mercurial currently doesn't formally record who "signed off" on a commit. Many projects have adopted a "two person rule" where any new commit requires at least 2 people: an author and a separate (trusted) person to sign off on it. Organizations like Mozilla have resorted to annotating commit messages with this metadata. e.g. "r=indygreg" (this means "positively reviewed by indygreg").
Anybody who can edit a commit message can add this metadata and create falisified entries.
A nefarious individual could construct a commit (message) that appears to have sign-off and then convince someone to land it.
2.4. Disagreements Between Author and Committer
In many workflows, a committer/reviewer may modify code before landing it. e.g. they will fix style nits to avoid another submission cycle.
While these workflows can have benefits, they may not always be wanted. For example, when dealing with security and crypto code, a competent author may insist they have a final say before code lands. This would help prevent inadvertent "in flight" changes from the committer introducing bugs, etc.
A similar issue is when a reviewer says a change is good but the author changes something before landing in a way that would invalidate the sign off.
A more formalized method for verifying changes land exactly as intended could prevent disagreements between author and committer and could be formalized via repository hooks to enforce a "two person rule" where one of the people is the author.
2.5. Lack of Push Log
If a falsified commit gets introduced to the repository, it isn't always clear how it got there because the Mercurial server does not keep a formal log of this. This problem has more or less been solved by the pushlog extension. However, this data only establishes a paper trail: it doesn't provide proactive detection against falsified entries being introduced.
Mozilla's pushlog extension also has a weak point: single point of failure. The log is created on the server and can't be cryptographically proven. There is trust that the server is telling the truth.
3. Desired Scenarios and Workflows
3.1. Commits Transferred Verifiably
A commit author should be able to transmit a commit to another entity via an insecure transmission channel (such as email) and the receiving party should be able to verify the commit arrived untampered. Comparing SHA-1's is not sufficient because it requires the author to transfer the SHA-1 to the receiver and we don't trust the transmission channel.
The receiver should be able to verify with cryptographic certainty from details in the patch/commit represenration that the author is who they said they are and that the diff arrives without tampering.
3.2. Verified Sign-Off of Specific Files or Directories
A repository consumer should be able to verify that all changes to specific files or directories have proper signatures and were signed off by trusted parties.
3.3. Require Sign-Off on Push
Repository operators should be able to deploy hooks that enforce requirements that pushed commits have proper sign-off.
It should be possible to require sign-off on all commits or just commits touching certain files or directories.
It should be possible to require certain signing keys be used for changing specific files or directories.
3.4. Require Author and Sign-Off agreement
It should be possible to require that a commit have signatures from both the author and a sign-off party.
This ensures that authors don't have their work changed before push without approval.
4. GPG Extension and Its Limitations
Mercurial ships with a "gpg" extension that allows commits to be signed with GPG. This is done by:
- Find the SHA-1 of a changeset to be signed
- Produce a GPG signature of that SHA-1
- Append the signature to the .hgsigs file
- Commit the result
See b09e5150bf8f for an example commit.
This is the only mechanism currently built in to Mercurial to establish a chain of trust for a commit.
There are some limitations with the existing extension.
First, it isn't practical to sign every commit to the repository. This is because every signing operation requires a new commit to record the added signature(s). This effectively means one extra commit per push operation. In practice, nobody takes this approach. Instead, only a small number of commits are signed. Commonly, it's only release commits or tag commits that are signed.
Second, commit signing isn't scalable for high commit volume workloads. Organizations like Facebook and Mozilla commit to repositories so frequently that there are "push races" to repositories and pushers typically need to rebase before pushing. Since a rebase would rewrite the commit's SHA-1, it would invalidate the GPG signature and require re-signing. This would require one of the following to overcome:
- Signers would need to take responsibility for pushing commits they sign off on.
- A different entity would have to re-sign the commits.
#1 may be unacceptable to some organizations and workflows, as it effectively requires that the person doing sign-off is the person pushing changesets. There is overhead here.
#2 would break the chain of trust from the original signer. e.g. if the server re-signed commits as they were rebased on the server, you now have a single point of failure to attack: the server doing the re-signing. If you can compromise that single server, all bets are off. Breaking the chain of trust undermines a purpose of signed commits and is thus an inadequate solution for people wanting signed commits for trust chain verification.
5. Proposed Solution: In-Commit Signing
The issues of extra commits and rebasing losing signatures can be worked around by introducing a new method of commit signing.
Instead of signing the SHA-1 of the commit (which is derived from the content of all files in the repository at the time of the commit (the manifest), all ancestor commits, and fields like date, author, and commit message from the commit itself), we will sign a representation covering just the changes in the commit. This signature will be added to commits themselves such that signing doesn't require additional commits.
GPG will be used for signing (unless there is a better idea).
Generically, the process for signing a changeset is thus:
- Build a representation of the changes made in a changeset
- Sign that representation
- Add signature and representation method to an extra field in the changeset and commit the amended result
This is equivalent to adding the result of gpg --detach-sign to a changeset.
The process for verifying a changeset is thus:
- Read the representation method from signature metadata
- Build a representation of what was signed
- Verify the signature of the representation is valid
Note: it isn't necessary to hash the representation if signature length isn't proportional to input data length. Eliminating hashing removes a potential weakness (if the hash algorithm is ever cryptographically unsafe, we aren't vulnerable to same hash attacks in the future - at least not beyond how Mercurial is already vulnerable due to using hashes to represent file contents).
5.1. Creating Representations of Commits
- Full manifest node will be omitted
- Extra fields belonging to the fields used to hold signatures will be omitted
- Parent changesets will not be included
In the absence of the full manifest node (which is a representation of the state of every file in the commit), we will construct a partial manifest consisting of just the files changed by the commit. Like normal manifests, the hash of the content after the commit will be used (this is easier than producing a diff, which can be represented N different ways). We will need to include an explicit list of deleted files, since these aren't explicitly captured by manifests. e.g.
mercurial/hg.py 23cc12f225f1b42f32dc0d897a4f95a38ddc8f4a mercurial/deleted.py 217bc3fde6d82c0210cf56aeae11d05a03f35b2b d
The representation of a commit is thus stable as long as the following conditions are met:
- The commit date, author, message, branch, and any other fields stored in "extra" (exluding those use to hold signatures) are not modified
- The commit is rebased and no file merges occur (end state of files modified by the commit does not change)
The representation of a commit is thus conveying the commit metadata and end state of files changed by the commit (as opposed to commit data, all parent commits, and end state of all files in the repo at the time of the commit). The produced representation and signature sacrifices some details to achieve flexibility and usability.
5.2. Storing Signatures
Signatures will be stored in "extra" fields as part of the changeset. The following fields will be added (bikeshedding over names is needed):
The Author-Signature-Method field will capture details on how the representation to obtain the data to be signed was obtained.
The Author-Signature field will capture a signature from the author of the commit. Mercurial will verify that the key used to produce this signature matches the author field in the commit. This field and signature can be used to verify that commits are coming from the person the author field says they are coming from, thus preventing spoofing in the author field. This field is arguably not as important as establishing trust for sign-off.
The Sign-Off-Signatures-N-Method fields capture details on how each representation of the changeset was produced.
The Sign-Off-Signature-N fields (where N is an integer) will hold signatures from people signing off on the commit. These signatures can be used to verify that a trusted person reviewed the change and that the change landed exactly as the reviewer intended. We'll start at count 0 or 1 and append new signatures to the end as they arrive.
All Sign-Off-Signature-* fields will be ignored when computing the representation of a commit for in-commit signing. This allows signatures to be added or removed without requiring re-signing.
The exact format of the "Method" fields is TBD.
6. Concerns and Open Issues
6.1. ctx.files() Validity
ctx.files() is never validated to be accurate. If ctx.files() is relied upon, a malicious person could defeat signing by producing changesets with incompletes files lists.
We may need to walk manifests to ensure the set of changed files is accurate. Or, we may want to change behavior of Mercurial to more strongly validate the set of changed files in the changelog is accurate. This could have performance implications.
6.2. Configuring What is Signed
Like existing full-commit signatures, in-commit signatures could still get invalidated in a lot of workflows. If a file-level merge occurs on rebase, the signature becomes invalid. If the commit message changes, the signature becomes invalid. This may cause excessive churn and require re-signs. Security or convenience: pick one.
We should consider allowing signers to choose what data is encapsulated by a signature. i.e. you should be able to sign the state of the files but not the commit message or date.
Having configurable scope would require we store the scope of what's signed somewhere next to the signature. This could be as simple as a space or comma delimited list of strings or characters (or even numeric enumerations if we wanted to save space).
Configurable scope also addresses another concern: future compatibility. If Mercurial grows new fields, clients should still be able to verify signatures produced with the old signing method. New clients should be able to switch to new signatures as soon as they are available. We almost certainly will need to include a "signature method" marker next to the signature. This could be as simple as an integer version number.
6.3. File Nodes in Manifests Derived from Parent
The file nodes used by manifests are not simply hashes of file content: they also include the parent nodes from the filelog.
What impact does this have, if any?
6.4. Adding Signatures Invalidates Changeset Nodes
Adding signatures to commits would rewrite the commit and invalidate the previous SHA-1. This would require a lot of rebasing. Security or convenience: pick one.
6.5. In-Commit Signatures Limits Post-Landing Signing
One of the benefits of the existing .hgsigs tags signing solution is that anyone can add a signature for any tag at any time.
Having signatures be part of the changeset and changelog means that once a changeset is published, adding signatures and rewriting history would be strongly frowned upon by most users.
Having a post-landing mechanism for signing commits is tempting.
If signatures exist outside of changesets, we need to solve a data syncing problem.
External signatures are also regression in protection: signatures could be modified and this could be difficult to notice. Having the signatures baked into the changeset/changelog has the nice property that signatures influence SHA-1s and rewriting would almost certainly be noticed.
6.6. Merges and Signing
It's worth explicitly mentioning how merges and signing interact.
Merges are commits. But they are different from "normal" commits in that in many workflows there is a single person doing the merging, not two people collaborating.
Merges produce new manifests - new versions of files. As such, they represent an opportunity to inject bad or undesired code. For example, say you have an integration branch into mainline. The integration branch contains signed changes to highly-sensitive crypto code. A repo maintainer may periodically merge the integration branch into mainline. Do you trust that this repo maintainer committed the proper version of the changed crypto files? Do you want to extend trust to anyone who does a merge? For some projects, this seems like an excessive grant of trust.
Merges MUST be signed to preserve the chain of trust. However, many people view merges as just another VCS operation and forget they often produce new versions of files.
If security and chain of trust is important, it is probably a good idea to go with a linear repo history model instead of relying on hard-to-review-and-sign merges. Since the signature proposal outlined above facilitates rebasing without invalidating signatures, linear history without merges is achievable.
6.7. Defending Against Hash Vulnerabilities
The SHA-1 of file content (among other things) is used to represent the state of a file. When SHA-1 becomes cryptographically unsound, attackers may be able to subtly change file content without SHA-1 changing. Existing signatures would still be valid.
Using raw file content instead of SHA-1 for producing representation of files would be preferred. But this could be computationally expensive.
Another mechanism is to add additional representations of the data. For example, file size could be used.
We could also generate a diff that is validated against its parent. Although, diffs can be produced many different ways and we would need to guarantee forward compatibility of diff generation to preserve signature validity.
7. Comparison to Other VCS Systems
Git supports built-in GPG signing of tags and commits. More info is available at Git-Tools-Signing-Your-Work.
On the technical level, GPG signatures are embedded in the Git commit object. e.g.
$ git cat-file -p 4a4018831d2ebc3c9cae9c6613e6a2497b4f0993 tree 4d5fcadc293a348e88f777dc0920f11e7d71441c parent fada8be975cc2991c8eb684dd9f5718a213c958c author Gregory Szorc <email@example.com> 1428111907 -0700 committer Gregory Szorc <firstname.lastname@example.org> 1428111907 -0700 gpgsig -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVH0IjAAoJECDco9/l6Zyz6KoQAK1oUhzW0EOMEQi4GSKyZ1V8 9cTLky7Xayoth94msQ5XNb0QKIGyqL4M06MVKzQTo7QOdqOCZxqlCPChOMqavojd ... i1n9LCWXpeQT1JWzwxbl =6N2d -----END PGP SIGNATURE----- commit message
The content being signed is essentially the commit object minus the lines belonging to the GPG signature. Included in that content is the tree object (effectively equivalent to Mercurial's manifest) and the list of parent commits. This means that rebases nullify signatures (the parent changes and the tree likely changes as well).
Not included in content being signed is the commit SHA-1 itself. The commit SHA-1 is derived from the content of the commit object including the GPG signature. This means you cannot sign old commits without changing their SHA-1. This will, of course, invalidate any SHA-1 of children commits.
Git also supports signed pushes. It works like the following:
- During push, after discovery, Git generates a representation of the updates to be made to the server (who is making them, ref changes, etc).
- Git prompts user to GPG sign that representation
- Signed object written sent to server (as part of transferred packfile)
- Server can inspect signature via hooks and optionally log actions
The signed push blob appears to not be retained by the client or server after the push: that is left as something a repo maintainer must configure. If the signed push blobs are retained, it effectively constitutes a cryptographically strong log of pushes.
Git Horror Story is a good overview of the state of signing in commit. It is also a good overview of trust in VCS systems.
http://grimoire.ca/git/detached-sigs is an interesting proposal about detached signatures in Git.
http://karl.kornel.us/2017/10/welp-there-go-my-git-signatures/ discusses some of the problems with key expiration/revocation in Git.
Monotone has signing and key management built-in.
The Certificates Documentation gives an overview of what signing looks like. Essentially there are "statements" attached to commits. Each statement has a name and a value. e.g. "date" and "2015-04-02 13:37:00". Statements can be signed by RSA keys, at which point it becomes a "certificate." Anything beyond files, manifests, and revisions is stored and transmitted as certificates. This includes the date, author, branch, and commit message (changelog). Essentially all metadata is signed by default and there is no way to not do things that way.
Monotone handles creation and management of keys for you. It also has built-in hook points for dealing with trust granting based on signing. e.g. "trust key X to sign property Y from revision Z?"