Convert to parentdelta

Dan Villiom Podlaski Christiansen danchr at gmail.com
Sun Aug 22 06:41:17 CDT 2010


On 21 Aug 2010, at 16:19, Matt Mackall wrote:

> On Sat, 2010-08-21 at 16:10 +0200, Dan Villiom Podlaski Christiansen
> wrote:
>> On 21 Aug 2010, at 14:09, Pradeepkumar Gayam wrote:
>>
>>> On Sat, Aug 21, 2010 at 5:14 PM, Dan Villiom Podlaski Christiansen <
>>> danchr at gmail.com> wrote:
>>>
>>>> On 21 Aug 2010, at 13:36, Pradeepkumar Gayam wrote:
>>>>
>>>>> That didn't cross my mind. `hg clone source dest --parent-delta`  
>>>>> to
>>>>> convert might be a good idea. Any suggestions?
>>>>
>>>> Personally, I'd expect ‘hg clone --config format.parentdelta=1 -- 
>>>> pull
>>>> source dest’
>>>
>>> This command doesn't completely convert repo. It partially converts.
>>> If you
>>> try both the methods you can see the difference.
>>
>> My point was that it ought to. IMHO you should detect this situation
>> and then do whatever magic it is your extension currently does :)
>
> There's still a lot of work to do before this can happen. The current
> parentdelta code is basically a proof of concept that compression
> improves. We still need to rewrite the wire protocol bits that push
> changesets over the wire to support this.

Ah, I see. I noticed that ‘compress’ doesn't create parent deltas  
unless they're enabled in the configuration. Perhaps it should  
override this setting?

Also, I've done some measurements of the effect of parent deltas.  
First, I did two conversions using hgsubversion of the Python  
repository, one with parent deltas enabled and the other without.  
Then, I made a copy of the normal delta clone and compressed the  
largest revlogs using the shrink extension in contrib. I also  
compressed the normal clone twice; once using an unmodified revlog.py,  
and once with ‘dist > len(text) * 2’ replaced with ‘dist > len(text) *  
8’ at line 1204.

The resulting store sizes:

normal:     1700MB
parentdelta: 389MB
compressed:  386MB
shrunk:      269MB
compressed8: 229MB

The resulting 00manifest.d sizes:

normal:     1342MB
parentdelta: 161MB
compressed:  161MB
shrunk:       31MB
compressed8:  19MB

As can be seen, the current implementation results in fairly good  
compression, but with room for improvement. My guess is this is caused  
by a slight bias against parent deltas in the current code.  
Specifically, the distance that is used for comparing against the raw  
text is calculated like this:

dist = l + offset - self.start(base)

It seems to me that this isn't terribly meaningful for parent deltas.  
I suspect calculating the actual distance would be somewhat costly, so  
perhaps it would be better to store the actual distance to base either  
alongside or instead of the base revision?

--

Dan Villiom Podlaski Christiansen
danchr at gmail.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1943 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100822/f0273393/attachment.bin>


More information about the Mercurial-devel mailing list