[RFC] [revlog] parent-delta
Benoit Boissinot
benoit.boissinot at ens-lyon.org
Tue Sep 1 02:15:45 CDT 2009
On Mon, Aug 31, 2009 at 08:19:53PM -0500, Matt Mackall wrote:
> On Tue, 2009-09-01 at 01:57 +0200, Benoit Boissinot wrote:
> > After playing with the topo-sort algorithm for revlog shrinking [1],
> > I've build a script to emulate the result of parent delta [2] on big
> > revlogs.
> >
> >
> > parent-delta manifest with a "read factor" of 6:
> > 363655853 bytes ( 346.8 MiB)
> >
> > parent-delta manifest with a "read factor of 4:
> > 464570388 bytes ( 443.0 MiB)
>
> Hmm, that's a bit disappointing. In the limit of high read factors, this
> should do better than toposort (better because at infinity, no full
> versions are needed). But I was hoping that we'd converge on toposort
> compression levels with read factors of around this size.
>
> In highly linear manifest graphs, runs of thousands of small deltas are
> between full versions are common, even with an 'uncompressed read
> factor' of only 2. Unfortunately, the kernel has evolved a development
> style where it regularly holds off merges for months, then merges 10k
> csets from dozens of different branches all at once, so each branch
> merge has a large revision gap to the conceptually nearer of its
> parents.
>
> How large does the read factor have to be to be comparable to toposort?
> It'd be interesting to get 4 or 5 points on this curve (2, 4, 6, 10,
> 16?). I suspect at some factor we'll get to a point where the 'merge
> window' fits inside the 'read factor' and compression will suddenly jump
> substantially.
Here are the results for more points:
read factor: 2x: 784324298 bytes (748.0 MiB) / 673 full, 3576 tip, 154023 parents
read factor: 3x: 557458333 bytes (531.6 MiB) / 319 full, 2988 tip, 154965 parents
read factor: 4x: 464570388 bytes (443.0 MiB) / 202 full, 2534 tip, 155536 parents
read factor: 5x: 402370635 bytes (383.7 MiB) / 141 full, 2162 tip, 155969 parents
read factor: 6x: 363655853 bytes (346.8 MiB) / 108 full, 1974 tip, 156190 parents
read factor: 7x: 327589135 bytes (312.4 MiB) / 85 full, 1716 tip, 156471 parents
read factor: 8x: 283813734 bytes (270.7 MiB) / 66 full, 1459 tip, 156747 parents
read factor: 9x: 301404051 bytes (287.4 MiB) / 61 full, 1593 tip, 156618 parents
read factor: 10x: 283411292 bytes (270.3 MiB) / 53 full, 1447 tip, 156772 parents
read factor: 11x: 270289465 bytes (257.8 MiB) / 46 full, 1300 tip, 156926 parents
read factor: 12x: 230440255 bytes (219.8 MiB) / 38 full, 1104 tip, 157130 parents
read factor: 13x: 199633336 bytes (190.4 MiB) / 32 full, 913 tip, 157327 parents
read factor: 14x: 216688491 bytes (206.7 MiB) / 31 full, 940 tip, 157301 parents
read factor: 15x: 222788215 bytes (212.5 MiB) / 29 full, 1010 tip, 157233 parents
read factor: 16x: 204505479 bytes (195.0 MiB) / 25 full, 866 tip, 157381 parents
read factor: 17x: 202987010 bytes (193.6 MiB) / 25 full, 807 tip, 157440 parents
read factor: 18x: 159966363 bytes (152.6 MiB) / 21 full, 630 tip, 157621 parents
read factor: 19x: 175856108 bytes (167.7 MiB) / 19 full, 673 tip, 157580 parents
read factor: 20x: 139092942 bytes (132.6 MiB) / 16 full, 502 tip, 157754 parents
read factor: 21x: 136019641 bytes (129.7 MiB) / 15 full, 473 tip, 157784 parents
read factor: 22x: 107451218 bytes (102.5 MiB) / 13 full, 287 tip, 157972 parents
read factor: 23x: 111780627 bytes (106.6 MiB) / 13 full, 286 tip, 157973 parents
read factor: 24x: 97257884 bytes ( 92.8 MiB) / 11 full, 237 tip, 158024 parents
read factor: 25x: 119240008 bytes (113.7 MiB) / 13 full, 326 tip, 157933 parents
read factor: 26x: 125427730 bytes (119.6 MiB) / 13 full, 405 tip, 157854 parents
read factor: 27x: 110754826 bytes (105.6 MiB) / 11 full, 352 tip, 157909 parents
read factor: 28x: 90371826 bytes ( 86.2 MiB) / 10 full, 176 tip, 158086 parents
read factor: 29x: 97862756 bytes ( 93.3 MiB) / 9 full, 184 tip, 158079 parents
read factor: 30x: 121359803 bytes (115.7 MiB) / 12 full, 306 tip, 157954 parents
For comparison, with topo sort (and tip-delta):
177162218 bytes (168.9 MiB) / 98 full, 8200 tip, 149854 parents (tip is in parents)
Note: the size is not monotone depending on the read factor. Should I
play with a read factor depending on the uncompressed version
instead?
>
> Note here that the compression of revision x can depend pretty
> sensitively on the compression of revision x-1, especially if they're on
> different branches. If x-1 gets stored as a full revision, this might
> force x, x+1, x+2 to store full revisions as well, as their base
> revisions are now too far away for delta storage.
My algorithm fallbacks to tip-delta if parent-delta doesn't fit.
>
> Also note that it might make sense to make read factor a tunable. On
> SSD, seek time and transfer rate are much smaller issues than disk space
> usage (at least for now).
If you want to trade CPU for space and bandwidth, then on the fly
reordering should help a lot, at least for the people who pulls after a
while, or for fresh clones.
regards,
Benoit
--
:wq
More information about the Mercurial-devel
mailing list