[RFC] [revlog] parent-delta

Benoit Boissinot benoit.boissinot at ens-lyon.org
Tue Sep 1 02:15:45 CDT 2009


On Mon, Aug 31, 2009 at 08:19:53PM -0500, Matt Mackall wrote:
> On Tue, 2009-09-01 at 01:57 +0200, Benoit Boissinot wrote:
> > After playing with the topo-sort algorithm for revlog shrinking [1],
> > I've build a script to emulate the result of parent delta [2] on big
> > revlogs.
> > 
> > 
> > parent-delta manifest with a "read factor" of 6:
> > 363655853 bytes ( 346.8 MiB)
> > 
> > parent-delta manifest with a "read factor of 4: 
> > 464570388 bytes ( 443.0 MiB)
> 
> Hmm, that's a bit disappointing. In the limit of high read factors, this
> should do better than toposort (better because at infinity, no full
> versions are needed). But I was hoping that we'd converge on toposort
> compression levels with read factors of around this size. 
> 
> In highly linear manifest graphs, runs of thousands of small deltas are
> between full versions are common, even with an 'uncompressed read
> factor' of only 2. Unfortunately, the kernel has evolved a development
> style where it regularly holds off merges for months, then merges 10k
> csets from dozens of different branches all at once, so each branch
> merge has a large revision gap to the conceptually nearer of its
> parents. 
> 
> How large does the read factor have to be to be comparable to toposort?
> It'd be interesting to get 4 or 5 points on this curve (2, 4, 6, 10,
> 16?). I suspect at some factor we'll get to a point where the 'merge
> window' fits inside the 'read factor' and compression will suddenly jump
> substantially.

Here are the results for more points:
read factor:  2x:  784324298 bytes (748.0 MiB) / 673 full, 3576 tip, 154023 parents
read factor:  3x:  557458333 bytes (531.6 MiB) / 319 full, 2988 tip, 154965 parents
read factor:  4x:  464570388 bytes (443.0 MiB) / 202 full, 2534 tip, 155536 parents
read factor:  5x:  402370635 bytes (383.7 MiB) / 141 full, 2162 tip, 155969 parents
read factor:  6x:  363655853 bytes (346.8 MiB) / 108 full, 1974 tip, 156190 parents
read factor:  7x:  327589135 bytes (312.4 MiB) /  85 full, 1716 tip, 156471 parents
read factor:  8x:  283813734 bytes (270.7 MiB) /  66 full, 1459 tip, 156747 parents
read factor:  9x:  301404051 bytes (287.4 MiB) /  61 full, 1593 tip, 156618 parents
read factor: 10x:  283411292 bytes (270.3 MiB) /  53 full, 1447 tip, 156772 parents
read factor: 11x:  270289465 bytes (257.8 MiB) /  46 full, 1300 tip, 156926 parents
read factor: 12x:  230440255 bytes (219.8 MiB) /  38 full, 1104 tip, 157130 parents
read factor: 13x:  199633336 bytes (190.4 MiB) /  32 full,  913 tip, 157327 parents
read factor: 14x:  216688491 bytes (206.7 MiB) /  31 full,  940 tip, 157301 parents
read factor: 15x:  222788215 bytes (212.5 MiB) /  29 full, 1010 tip, 157233 parents
read factor: 16x:  204505479 bytes (195.0 MiB) /  25 full,  866 tip, 157381 parents
read factor: 17x:  202987010 bytes (193.6 MiB) /  25 full,  807 tip, 157440 parents
read factor: 18x:  159966363 bytes (152.6 MiB) /  21 full,  630 tip, 157621 parents
read factor: 19x:  175856108 bytes (167.7 MiB) /  19 full,  673 tip, 157580 parents
read factor: 20x:  139092942 bytes (132.6 MiB) /  16 full,  502 tip, 157754 parents
read factor: 21x:  136019641 bytes (129.7 MiB) /  15 full,  473 tip, 157784 parents
read factor: 22x:  107451218 bytes (102.5 MiB) /  13 full,  287 tip, 157972 parents
read factor: 23x:  111780627 bytes (106.6 MiB) /  13 full,  286 tip, 157973 parents
read factor: 24x:   97257884 bytes ( 92.8 MiB) /  11 full,  237 tip, 158024 parents
read factor: 25x:  119240008 bytes (113.7 MiB) /  13 full,  326 tip, 157933 parents
read factor: 26x:  125427730 bytes (119.6 MiB) /  13 full,  405 tip, 157854 parents
read factor: 27x:  110754826 bytes (105.6 MiB) /  11 full,  352 tip, 157909 parents
read factor: 28x:   90371826 bytes ( 86.2 MiB) /  10 full,  176 tip, 158086 parents
read factor: 29x:   97862756 bytes ( 93.3 MiB) /   9 full,  184 tip, 158079 parents
read factor: 30x:  121359803 bytes (115.7 MiB) /  12 full,  306 tip, 157954 parents

For comparison, with topo sort (and tip-delta):

177162218 bytes (168.9 MiB) / 98 full, 8200 tip, 149854 parents (tip is in parents)

Note: the size is not monotone depending on the read factor. Should I
      play with a read factor depending on the uncompressed version
      instead?
> 
> Note here that the compression of revision x can depend pretty
> sensitively on the compression of revision x-1, especially if they're on
> different branches. If x-1 gets stored as a full revision, this might
> force x, x+1, x+2 to store full revisions as well, as their base
> revisions are now too far away for delta storage.

My algorithm fallbacks to tip-delta if parent-delta doesn't fit.
> 
> Also note that it might make sense to make read factor a tunable. On
> SSD, seek time and transfer rate are much smaller issues than disk space
> usage (at least for now).

If you want to trade CPU for space and bandwidth, then on the fly
reordering should help a lot, at least for the people who pulls after a
while, or for fresh clones.

regards,

Benoit

-- 
:wq


More information about the Mercurial-devel mailing list