Commit Audit Trails

Motivation

Accurately tracking a commit's "chain of custody" can be difficult in a distributed environment, even working completely openly. Commits are created at a different time from when they are added to canonical history, and are typically changed in the process (perhaps by rebasing onto other new commits). Beyond chain of custody, organizations often encode workflow in commit messages by embedding a vocabulary of "attestations": code review and testing sign-offs are two notable examples. Using DVCS this means that for K changes of custody, a commit will have K different IDs.

Background

At present, Mercurial offers no standard mechanism for distinguishing the author of a commit from the committer: the person who added the commit to a given repository. Mercurial only has an author field. This limits the extent to which Mercurial can readily demonstrate changeset provenance. With support for history modification operations like graft, rebase and histedit, repository maintainers easily modify changesets authored by others without leaving a record of this fact.

The EvolveExtension facilitates workflows that rewrite history by storing and sharing out-of-band "obsolescence markers" that specify temporal relationships between local commits. These markers could only be used to construct an audit trail if every "revision" of the commit is stored locally, and these markers deliberately prevent sharing all but the latest revision of a commit.

Design

Each changeset has zero or more audit trail entries, fully ordered and counted from 1..N (or 0..N). Lower-ordered entries occur before higher-ordered entries. The 0th entry is optional and contains attestations by the original changeset author. Whenever a changeset is modified, an audit trail entry is either created or updated appropriately.

Each entry in the audit trail is represented as an extra field with a well-known key of the form "sigK", where 0 ≤ K ≤ N is the entry's order. The value of the extra field is a canonical encoding of the (non-empty) set of attestations in the audit entry.

An attestation is a name-value pair. Names contains alphanumeric characters, dots and hyphens. Attestation names starting with "hg." are standard and supported by a sufficiently recent version of Core mercurial. Each entry must provide the following standard attestations:

attestation

value

hg.custodian

RFC822 e-mail address of the user creating the entry

hg.date

same format as revlog dates

hg.link

hex changeset ID before this signature was added

Full Verification

To verify the integrity of an audit trail, Mercurial must verify each changeset in the chain from the most recent signature to the unaudited changeset by following hg.link attestations. A verification result may be either success, failure, or indeterminate if not all necessary revisions are available locally.

Starting at the latest revision of the changeset, C:

  1. Verify C's changeset hash.
    1. If hash verification fails, terminate with failure.
  2. If C has no signatures, terminate with success.
  3. Let D be the hg.link value of C's latest signature, sig[N].

  4. If D is not present locally, terminate with indeterminate.
  5. Verify that both C and D have identical sig[0], sig[1], ... sig[N-1].
    1. If all signatures match, let C = D, go to step 1. b. If any signature differs between C and D (except sig[N] of course), terminate with failure.

Security

The design above supports ensuring integrity in the audit trail but not authenticity. Since the hg.custodian attestation is supplied by the user, it cannot be trusted. To provide for authenticity, security may be layered onto the above design by means of additional required attestations.

To support authenticated audit trails out-of-the-box, the ''gpg'' extension will be modified to support (at least) one new attestation, based upon the format used by the existing gpg extension:

attestation

value

hgext.gpg.sig

$SIGVERSION (space) $GPGSIG

To sign a changeset with multiple GPG keys, use a different user for each key to produce multiple signatures.

Additional Policies

TODO(adgar): Attestations for some/all of the tracked content, such as manifest ID, so verifiers can ensure custodians did not alter tracked content without full verification (requiring all revisions of a changeset exist locally).

Required Changes

Changeset "extra" fields are typically not preserved in many history rewriting operations, besides special-case support for branches. They also aren't included in emails formatted by the patchbomb extension, an important means of changeset exchange.


CategoryProject

AuditTrailPlan (last edited 2015-12-02 17:29:44 by MichaelEdgar)