[PATCH V2] generaldelta: initialize basecache properly

Tim Delaney timothy.c.delaney at gmail.com
Sat Sep 21 18:47:21 CDT 2013


On 22 September 2013 08:06, Matt Mackall <mpm at selenic.com> wrote:

> On Sun, 2013-09-22 at 05:01 +1000, Tim Delaney wrote:
> > Is there a reason generaldelta is still undocumented and/or not yet the
> > default format? It's been in since Mercurial 1.9 and I only found out
> about
> > it thanks to this patch. My work repo (~55000 changesets, mostly from SVN
> > via hgsubversion) has had its 00manifest.d reduced from ~1.4GB to ~28MB
> by:
>
> The reason is that you could still end up sending that 1.4GB over the
> wire =and= taking substantially more CPU than before... because the wire
> protocol can only do linear deltas and thus will have to recompute the
> deltas for the old format. This will be fixed when we get the new bundle
> format figured out.
>
> You might find that a standard clone of your generaldelta repo is
> smaller than your original repo.
>

Not quite, but it's close - ~31MB compared to ~27MB.

I think I might have got my understanding backwards before. Background - my
repo is a lot of fairly unrelated branches - the SVN repo is essentially
several unrelated repos implemented as different branches plus a number of
related feature branches where no merging occurs - just branching off. Any
and all of the branches may be committed to resulting in a lot of
completely unrelated interleaved commits (resulting in the 1.4GB manifest).

Based on what I'm seeing, this is what I think is happening when pulling
from a remote generaldelta repo to a local generaldelta repo over ssh.
Please correct me if I've got it wrong. If I'm right, generaldelta will be
a substantial win for my repo even with the existing wire protocol.

1. Remote reorders to produce the longest chains it can such that prev will
be a parent.

2. Remote recomputes the deltas.

There could be substantial savings here if the original order results in
lots of interleaved unrelated branches, but reordering results in long
chains on the same branch. This is what I would expect with my repo.

3. Local receives the deltas and then recomputes generaldelta. The
changesets are already in near-optimal order.

The debugrevlog -m supports this:

original standard repo:

deltas against prev  : 54779 (100.00%)
    where prev = p1  : 46283     (84.49%)
    where prev = p2  :   240     ( 0.44%)
    other            :  8256     (15.07%)

generaldelta cloned from standard locally (numbers slightly lower here -
hadn't pulled all changesets in at this point):

deltas against prev  : 47701 (86.95%)
    where prev = p1  : 46245     (96.95%)
    where prev = p2  :    91     ( 0.19%)
    other            :  1365     ( 2.86%)
deltas against p1    :  7129 (12.99%)
deltas against p2    :    32 ( 0.06%)
deltas against other :     0 ( 0.00%)

generaldelta cloned from generaldelta over ssh:

deltas against prev  : 55390 (99.69%)
    where prev = p1  : 55341     (99.91%)
    where prev = p2  :     3     ( 0.01%)
    other            :    46     ( 0.08%)
deltas against p1    :   167 ( 0.30%)
deltas against p2    :     3 ( 0.01%)
deltas against other :     0 ( 0.00%)

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20130922/8b69606e/attachment.html>


More information about the Mercurial-devel mailing list