[PATCH] Add script to rewrite manifest to workaround lack of parent deltas

Greg Ward greg-hg at gerg.ca
Fri Aug 21 16:58:13 CDT 2009


On Thu, Aug 20, 2009 at 7:04 PM, Benoit
Boissinot<benoit.boissinot at ens-lyon.org> wrote:
> On Thu, Aug 20, 2009 at 05:20:24PM -0400, Greg Ward wrote:
>> # HG changeset patch
>> # User Greg Ward <greg-hg at gerg.ca>
>> # Date 1233047576 0
>> # Node ID 7e0bbea3935b1044d3c5acfecae8005941dfc8ec
>> # Parent  2484868cffde3893e3fafb8e515d396346b87e17
>> Add script to rewrite manifest to workaround lack of parent deltas.
>
> Thanks for cleaning it up. Some comments below.
>
>> +
>> +def good_sort(rl):
>
> maybe toposort() ?

Yep.  Renamed the other two functions to be more Hg-ish too.

>
> You can directly iterate on the revs:
>
> for i in rl:

Fixed.

> The following is simpler:
>
>        children[i] = []
>        parents = [p for p in rl.parentrevs(i) if p != -1]
>        for p in parents:
>            assert p in children
>            children[p].append(i)
>        if len(parents) == 0:
>            root.append(i)

Duh, of course.  Thanks.  Fixed.  Also used node.nullrev as suggested
by Dirkjan.

> Maybe it's cleaner to pop from the end
>        i = visit.pop()
[...]
>> +            if len(parents_with_child) == 0:
>> +                next.append(c)
>> +        visit = next + visit
> if you pop from the end, then you can do:
>        visit += next

Not only is it cleaner, it massively improves the shrinkage factor on
my test repo: from 8.5x smaller to 16.9x smaller.  Specifically,
pop(0) with "visit = next + visit" shrank a 56.1 MB manifest to 5.5
MB; fiddling with the order shrank it to 3.3 MB instead.  Wow!  I
suppose I should test it for correctness though.  ;-)

>> +        ret.append(i)
>> +        if i not in children:
>> +            # this only happens if some node's p1 == p2, which can happen in the
>> +            # manifest in certain circumstances
>> +            break
>
> break or continue ?

Ooh, good catch.  Fixed.

>> +        next = []
>> +        for c in children.pop(i):
>> +            parents_with_child = [p for p in rl.parentrevs(c) if p != -1 and p in children]
>
> parents_unseen is maybe better, we don't care if they have children, but
> we care if we already visited them.

I'll take your word for it.  I don't entirely understand the
algorithm, and most of the names are taken from your original code.
So if you think parents_unseen is a better name, I'm all for it.
Done.

>> +def main():
[...]
>> +    (tmpfd, tmpindexfn) = tempfile.mkstemp(
>> +        dir=repo.join('store'), prefix='00manifest.', suffix='.i')
>
> I found it a bit cleaner to split after at least one arg.

OK, sure.  Done.

> Is the non-nested except/try/finally possible with python 2.4 ?

Oops.  Darn, that's my favourite 2.5 feature.  Reverted to 2.4 syntax.

Thanks for the review!  New patch coming soon.

Greg



More information about the Mercurial-devel mailing list