Use Cases

Prepending History

A company converts a repository to Mercurial but doesn't preserve history in the original repo. e.g. they copy code from a CVS repository to a Mercurial repository as revision 0. Later, they want to convert the existing history to Mercurial and expose it as a single history while preserving revision nodes (so a flag day isn't required and people don't have to mass convert repos).

Fixing Bad History / Form of commit censorship

Someone did something wrong. They merged against the wrong parent. They committed something they shouldn't have. People would like a mechanism to amend published history to remove bad commits from existence. They could do this by pointing the first good commit after badness to the last good commit before badness. e.g.

A -> B -> C -> D -> E

Say C is bad. We could rewrite the p1 of D to point to B, omitting C from history.

A -> B -> D -> E

Proposal

The proposal is to create a mechanism to allow "fake parent" data in changelog entries. A changeset will be rewritten to refer to a different parent.Alternatively, we could allow a changeset to replace an existing one. e.g. by creating a copy of a changeset and changing parents to point to new previous history. These are logically very similar.

An important property is that descendent changesets retain their original hash and will still verify. It is a requirement that invalidating descendent hashes is avoided because it will cause too many problems for co

This will invalidate the hash of the rewritten changeset. However, all descendent changesets will still verify because their manifests and parents will be valid.

Requirement for Generic Parent Rewriting

We'd prefer to only be able to rewrite rev 0 to prepend history. If we did this, we could rewrite parents of rev 0 and compute hashes assuming parents are nullid, preserving hash verification. Unfortunately, not all repos could be prepending this way. For example, mozilla-central's rev 0 introduces a .hgignore file and rev 1 is a copy of CVS. Rev 0 doesn't have the full manifest. So rewriting rev 0 would result in a massive diff between the last changeset from the prepended history, the initial rev 0, and rev 1. This would interfere with log, blame, etc.

Splices Should Be Applied Locally

mpm's suggestion was for modified/prepended history to be stored in a new DAG head/root. Clients would receive a listing of splices (possibly defined in repo history in a .hgsplices file or some such) and perform the splices locally. This would avoid hash verification issues.

There are potential complications with discovery (e.g. clients not supporting splice application pulling down unrelated heads).

Existing Clones Continue Working / New Clones Fetch Prepended/Rewritten History

If history is prepended on the server, existing clones that hg pull will continue to pull the original history: the prepended history will never be pulled. However, if a new clone is performed (by a client that supports "fake parent history") they will pull down the new, full history.

Client Side Requirement

It is likely we'll need to introduce a requirement on repos cloned with "fake parent history." This is necessary so legacy clients not supporting "fake parent history" won't barf when they encounter a changeset with fake parent history. This isn't strictly necessary when interfacing with a remote running a modern version. But it is necessary when interfacing with a local repo with an older Mercurial version/process.

Security Considerations

Introducing fake history into the repository is dangerous because it could be abused - possibly maliciously - to change repo contents, possibly unknowingly.

For this reason, changeset rewrites should probably not transfer to clients automatically. If we do transfer automatically, there should be a mechanism to disable the feature. And we should likely always report when rewrites have been seen so clients are aware tampering may be involved.


CategoryNewFeatures CategoryDeveloper

FakeParentPlan (last edited 2016-03-20 21:27:55 by GregorySzorc)