Differences between revisions 10 and 11
Revision 10 as of 2010-04-04 11:24:55
Size: 3914
Editor: tonfa
Comment: add headers to the table
Revision 11 as of 2010-07-31 09:44:44
Size: 3918
Editor: Pradeepkumar
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
Line 6: Line 5:
Line 12: Line 10:
Line 14: Line 11:
Line 22: Line 18:
Line 31: Line 26:
||||<style="text-align: center;" |2> ||<style="text-align: center;" |2>'''old client''' ||||<style="text-align: center;">'''new client''' ||
||'''old repo''' ||'''new repo''' ||
||||<style="text-align: center;">'''old server''' ||linear ||linear ||linear[4]a ||
||<style="text-align: center;" |2>'''new server''' ||'''old repo''' ||linear ||linear[2] ||linear[1][3][4]b ||
||'''new repo''' ||linear[1] ||parent[1][4] ||parent ||
Line 32: Line 32:
||||<|2> ||<|2>'''old client'''||||'''new client''' ||
|| '''old repo'''||'''new repo''' ||
||||'''old server''' ||linear ||linear ||linear[4]a ||
||<|2>'''new server'''||'''old repo'''||linear ||linear[2] ||linear[1][3][4]b||
|| '''new repo'''||linear[1] ||parent[1][4] ||parent ||

Line 45: Line 42:

Line 59: Line 53:
Line 63: Line 56:
 * This is ongoing work for GSoC 2010. Progress can be seen [[Pradeepkumar|here]].

Revlog parent deltas

The RevlogNG file format has a few known weaknesses, one of which is the lack of parent deltas. That is, new revisions are always delta'ed against the revision directly before it, when in many cases it makes much more sense to diff against the parent revision instead (e.g. when they're on very different branches). This makes some revlogs very space-inefficient, particularly in very branchy repositories or in repositories that have been converted from some other VCS. This can be seen from the reduction caused by "manual" reordering of the revlog, particularly for the manifest revlog in larger repositories.

1. Implementation

It would be nice to reduce at least some of the inefficiency by being able to delta against the parent revision in cases where that's an easy win. In RevlogNG, revisions can be constructed from taking the (specified) base revision and applying all revisions from that base revision, in a contiguous block, through to the requested revision. This means there's little seek time (seeks, being dependent on disk rotation speeds, are particularly expensive and resistent to Moore's law). Any algorithm for parent deltas should not expand the number of seeks too much, this means the distance between the offset of the base rev and tip should be smaller than a certain amount.

The case of merge revision should be investigated, should we always take the first parent, or can we allow both?

2. Strategy

2.1. Disk format changes

In current revlogs, the implicit rev to delta from is the former revision. If there are cases in which we don't want to do that anymore, we will:

  • use the first unused bit (flag equal to 0x1) to specify that the diff is against the revision specified in base (usually it would be one of the parent)

Of course, we also need all the backwards compatibility trappings (e.g. requires, wire protocol stuff).

2.2. Wire protocol changes

For efficiency (to avoid re-diffing every rev), the WireProtocol can be changed to allow a different changegroup format:

  • either always send a diff vs. the parent
  • or add have a bit that specify the kind of delta

If all the delta from the changegroup are using diff vs. parent, reordering on the fly during a pull/push/clone might be much cheaper (less delta to recompute).

Backward compatibility matrix:

old client

new client

old repo

new repo

old server




new server

old repo




new repo




[1] manifest and filelogs reordered by server on the fly

[2] new bundle format

[3] server reuses deltas from revlogs wherever possible, with reordering, may send parent deltas for deltas it generates at run time, client reuses delta whenever possible except when a config option is activated (which means the clients repo will always stay fully parent-delta-ified).

[4] client redeltas

Note: It might come in handy if we had the following amongst the delta base options:

  • previous in chain,
  • parent1,
  • parent2, or
  • none (meaning we send the full text).

This would help with shallow clones if they ever materialize.

3. Literature

ParentDeltaPlan (last edited 2015-03-08 04:32:20 by KevinBullock)