[PATCH] Add script to rewrite manifest to workaround lack of parent deltas

Benoit Boissinot benoit.boissinot at ens-lyon.org
Tue Aug 25 07:40:31 CDT 2009


On Fri, Aug 21, 2009 at 06:17:17PM -0400, Greg Ward wrote:
> On Fri, Aug 21, 2009 at 5:58 PM, I wrote:
> >> Maybe it's cleaner to pop from the end
> >>        i = visit.pop()
> > [...]
> >>> +            if len(parents_with_child) == 0:
> >>> +                next.append(c)
> >>> +        visit = next + visit
> >> if you pop from the end, then you can do:
> >>        visit += next
> >
> > Not only is it cleaner, it massively improves the shrinkage factor on
> > my test repo: from 8.5x smaller to 16.9x smaller.  Specifically,
> > pop(0) with "visit = next + visit" shrank a 56.1 MB manifest to 5.5
> > MB; fiddling with the order shrank it to 3.3 MB instead.  Wow!  I
> > suppose I should test it for correctness though.  ;-)
> 
> Damn.  Should have known it was too good to be true; this makes the
> sort rather unstable.  Example: with the original, visit.pop(0) and
> 'visit = next + visit' version, repeated runs of shrink-manifest look
> like this:
> 
> $ ~/src/hg-crew/contrib/shrink-manifest.py
> reading 15043 revs ................
> sorting ...
> writing 15043 revs ................
> old file size:     58830219 bytes (  56.1 MiB)
> new file size:      6929734 bytes (   6.6 MiB)
> shrinkage: 88.2% (8.5x)
> $ rm .hg/store/*.old
> $ ~/src/hg-crew/contrib/shrink-manifest.py
> reading 15043 revs ................
> sorting ...
> writing 15043 revs ................
> old file size:      6929734 bytes (   6.6 MiB)
> new file size:      6929734 bytes (   6.6 MiB)
> shrinkage: 0.0% (1.0x)
> 
> Good: stable and predictable.
> 
> But with visit.pop() and "visit += next", it's not so good:
> 
> $ ~/src/hg-crew/contrib/shrink-manifest.py
> reading 15043 revs ................
> sorting ...
> writing 15043 revs ................
> old file size:     58830219 bytes (  56.1 MiB)
> new file size:      3472373 bytes (   3.3 MiB)
> shrinkage: 94.1% (16.9x)
> $ rm .hg/store/*.old
> $ ~/src/hg-crew/contrib/shrink-manifest.py
> reading 15043 revs ................
> sorting ...
> writing 15043 revs ................
> old file size:      3472373 bytes (   3.3 MiB)
> new file size:     17573799 bytes (  16.8 MiB)
> shrinkage: -406.1% (0.2x)
> $ rm .hg/store/*.old
> ~/src/hg-crew/contrib/shrink-manifest.py
> reading 15043 revs ................
> sorting ...
> writing 15043 revs ................
> old file size:     17573799 bytes (  16.8 MiB)
> new file size:      5870243 bytes (   5.6 MiB)
> shrinkage: 66.6% (3.0x)
> 
> Yuck: I don't like that behaviour.  It should be OK to run
> shrink-manifest repeatedly on an already-shrunk manifest.  Reverting
> to the original algorithm.

I have an almost stable algorithm that stays around 3.3, do you think it
should be more stable (I can play more with it if needed)?

regards,

Benoit


PS: Dirkjan, maybe you want to play with it to see if it makes a
difference on the python repo

-- 
:wq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shrink.py
Type: text/x-python
Size: 6065 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20090825/8d76f2ba/attachment.py 


More information about the Mercurial-devel mailing list