[RFC] [revlog] parent-delta
benoit.boissinot at ens-lyon.org
Mon Aug 31 18:57:22 CDT 2009
After playing with the topo-sort algorithm for revlog shrinking ,
I've build a script to emulate the result of parent delta  on big
The attached script can be used to dump information about a revlog in
the following way:
$ python parent-delta.py dump .hg/store/00manifest.i > dump
This will create a file with some information about the compressed size
of each revision, compressed size of the delta to each parent and the
compressed size of the delta to tip.
$ python parent-delta.py emul < dump
the script will output the estimated size of the revlog if we were using
Currently the emulation of delta parent works in the following way:
- it computes all the possibles deltas (parent1, parent2 and tip)
- if the distance between the of the base delta chain is higher than a
certain threshold ("read factor", the value can be changed in the
source) it will discard the potential delta.
For example a threshold of 6 will only allow a given delta if the
distance between the base offset and the current offset is smaller
than 6 times the compressed full version.
- from the remaining deltas, it chooses the smallest or insert a full
Results with the linux kernel manifest (158k revs, about 29k files):
704803499 bytes (672 MiB)
dump is about 14MB and took some 4 hours to generate (I can try to put
it online if some people are interested).
parent-delta manifest with a "read factor" of 6:
363655853 bytes ( 346.8 MiB)
parent-delta manifest with a "read factor of 4:
464570388 bytes ( 443.0 MiB)
Notes about read factor:
In our current implementation of tip-delta, the read factor was given in
term of the uncompressed size of the full rev. Can someone give some
insights to explain if using the compressed size is more (or less)
Notes about the topo-sort+parent-delta:
I need to check the percentage of parent-delta that gets inserted (vs.
full revs or tip-delta). But if they are high enough and if we change
the wire protocol to allow parent-delta as well, it would be much
cheaper to do topo-sort on the fly since deltas would not be recomputed
for every rev (as is the case today if we try to do topo-sort during
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 2225 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20090901/1fcfac54/attachment.py
More information about the Mercurial-devel