Convert to parentdelta
Dan Villiom Podlaski Christiansen
danchr at gmail.com
Sun Aug 22 11:45:43 CDT 2010
On 22 Aug 2010, at 16:39, Matt Mackall wrote:
> On Sun, 2010-08-22 at 13:41 +0200, Dan Villiom Podlaski Christiansen
> wrote:
>> The resulting 00manifest.d sizes:
>>
>> normal: 1342MB
>> parentdelta: 161MB
>> compressed: 161MB
>> shrunk: 31MB
>> compressed8: 19MB
>
> compressed+shrunk might be interesting too.
> Can you send us results for manifest compression between 2x and 8x?
Sure, here's an updated listing:
normal: 1342MB
parentdelta: 161MB
compressed2: 161MB
compressed3: 49MB
compressed4: 32MB
shrunk: 31MB
compressed2-shrunk: 31MB
compressed4-shrunk: 29MB
compressed6: 24MB
shrunk-compressed2: 22MB
compressed8: 19MB
(Shrunk & compressed are listed in the order performed. I only did
shrunk-compressed2 in that order, as compression is *very* CPU
intensive.)
>> As can be seen, the current implementation results in fairly good
>> compression, but with room for improvement. My guess is this is
>> caused
>> by a slight bias against parent deltas in the current code.
>> Specifically, the distance that is used for comparing against the raw
>> text is calculated like this:
>>
>> dist = l + offset - self.start(base)
>>
>> It seems to me that this isn't terribly meaningful for parent deltas.
>> I suspect calculating the actual distance would be somewhat costly,
>> so
>> perhaps it would be better to store the actual distance to base
>> either
>> alongside or instead of the base revision?
>
> Not sure what you mean here, as this is the "actual distance": the
> amount of the disk needing to be read to pull this data in. Perhaps
> you
> mean "sum of length of deltas we need to read in". That number is less
> interesting - we really want to read this all in with one read
> request.
> Otherwise, it'll degrade into possibly thousands of blocking
> seek()/read() ops where we'll be waiting on I/O and getting
> rescheduled
> between each.
>
> That scale factor is important too. Making retrieval of all files take
> four times longer and four times as much memory isn't something we
> should do lightly. But it's worth investigating, especially in the
> case
> of the manifest.
Ah, I see. I wasn't aware that Mercurial would have to parse all the
intermediate revisions :)
--
Dan Villiom Podlaski Christiansen
danchr at gmail.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1943 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100822/b29b6cdf/attachment.bin>
More information about the Mercurial-devel
mailing list