Revlog parent deltas
The RevlogNG file format has a few known weaknesses, one of which is the lack of parent deltas. That is, new revisions are always delta'ed against the revision directly before it, when in many cases it makes much more sense to diff against the parent revision instead (e.g. when they're on very different branches). This makes some revlogs very space-inefficient, particularly in very branchy repositories or in repositories that have been converted from some other VCS. This can be seen from the reduction caused by "manual" reordering of the revlog, particularly for the manifest revlog in larger repositories.
1. Implementation
It would be nice to reduce at least some of the inefficiency by being able to delta against the parent revision in cases where that's an easy win. In RevlogNG, revisions can be constructed from taking the (specified) base revision and applying all revisions from that base revision, in a contiguous block, through to the requested revision. This means there's little seek time (seeks, being dependent on disk rotation speeds, are particularly expensive and resistent to Moore's law). Any algorithm for parent deltas should not expand the number of seeks too much.
2. Strategy
In current revlogs, the implicit rev to delta from is the former revision. If there are cases in which we don't want to do that anymore, we should either
- always use (one of the) revision parent(s)
- use some of the unused bits to designate a parent or switch parent selection
- add a field to specify the new base-delta revision in
Of course, we also need all the backwards compatibility trappings (e.g. requires, wire protocol stuff).
3. Literature
revlog experiments mailing list thread (February 2006)
more revlog experiments mailing list thread (March 2006)