[PATCH] Add script to rewrite manifest to workaround lack of parent deltas

Greg Ward greg-hg at gerg.ca
Mon Aug 24 16:06:49 CDT 2009


On Mon, Aug 24, 2009 at 12:26 PM, Benoit
Boissinot<benoit.boissinot at ens-lyon.org> wrote:
> On Fri, Aug 21, 2009 at 06:18:02PM -0400, Greg Ward wrote:
>> # HG changeset patch
>> # User Greg Ward <greg-hg at gerg.ca>
>> # Date 1233047576 0
>> # Node ID 4b03b395c5923319c1d9986ab47e6e3d99bfc6f8
>> # Parent  b47fa638bfc7d6e5185f49113a6c418fbe6cf0b3
>> Add script to rewrite manifest to workaround lack of parent deltas.
>
> Other comments (sorry I should have batched the review).

No worries.  I fixed all the join() calls to sjoin() as requested.

> If you want to use .pop() (I'm not sure it's worth it):

Not worth it IMHO.  I believe that pop() is slightly faster than
pop(0), but who cares?   Performance here is utterly totally
completely absolutely dominated by writerevs().  E.g. here's me
running shrink-manifest on our full repo converted from CVS:

"""
reading 108649 revs
.............................................................................................................
sorting ...
writing 108649 revs
.............................................................................................................
old file size:   3141546489 bytes (2996.0 MiB)
new file size:     59853864 bytes (  57.1 MiB)
shrinkage: 98.1% (52.5x)
~/src/hg-crew/contrib/shrink-manifest.py  2859.49s user 149.52s system
98% cpu 51:05.36 total
"""

Reading and sorting took a few seconds, even less with a hot disk
cache.  Writing the sorted manifest took almost an hour.  So who the
heck cares about performance of reading and sorting?

>> +    indexfn = repo.join('store/00manifest.i')
>> +    datafn = indexfn[:-2] + '.d'
>> +    if not os.path.exists(datafn):
>> +        sys.exit('error: %s does not exist: manifest not big enough '
>> +                 'to be worth shrinking' % datafn)
>> +
>
> I've found the warning annoying so I changed it like this:
>
>> +    oldindexfn = indexfn + '.old'
>> +    olddatafn = datafn + '.old'
>
>    oldindexfn = tempfile.mktemp(dir=repo.sjoin(''),
>                                 prefix='00manifest.',
>                                 suffix='.i.old')
>    olddatafn = oldindexfn[:-6] + '.d.old'

1) except tempfile.mktemp() is unsafe and should not be used
2) I don't understand how this avoids reading a 00manifest.d that does not exist

...so I'll resend without fixing this complaint yet.  Please reply to
the next patch so we can sort this problem out.

>> +    if os.path.exists(oldindexfn) or os.path.exists(olddatafn):
>> +        sys.exit('error: one or both of %s or %s exists from a previous run; '
>> +                 'please clean up before running again'
>> +                 % (oldindexfn, olddatafn))
>> +
>
> Final note: If someone wants to do that, the script could accept
> arbitrary revlogs as arguments, except the changelog (because it would
> need fixing the linkrevs).

Running this on an arbitrary revlog -- really, a filelog -- smells
like YAGNI to me.  I mean, yeah, sure, lack of parent deltas can cause
any revlog to be larger than it should be.  But most complaints are
about giant manifest files.  That's why I called the script
shrink-manifest, and that's why it only shrinks
.hg/store/00manifest.*.  We can refactor to support filelogs later if
it turns out to be necessary.  (In which case it should probably be
renamed to shrink-revlog, which will make it less obvious to someone
wondering what to do about their giant manifest file.)

Also, the only use case I can think of for running shrink-manifest
twice in a row (i.e. when the *.old files already exist) is testing
that it makes no change on a second run.  And that's really something
that only needs to be done by developers testing the script.  So I'd
rather not try to accommodate that and risk removing someone's *.old
files accidentally.

OTOH, shrink-manifest should probably print a note saying "left behind
00manifest*.old; delete them when you are satisified it worked".  I'll
add that.

Greg



More information about the Mercurial-devel mailing list