[PATCH] Add script to rewrite manifest to workaround lack of parent deltas
Greg Ward
greg-hg at gerg.ca
Mon Aug 24 16:06:49 CDT 2009
On Mon, Aug 24, 2009 at 12:26 PM, Benoit
Boissinot<benoit.boissinot at ens-lyon.org> wrote:
> On Fri, Aug 21, 2009 at 06:18:02PM -0400, Greg Ward wrote:
>> # HG changeset patch
>> # User Greg Ward <greg-hg at gerg.ca>
>> # Date 1233047576 0
>> # Node ID 4b03b395c5923319c1d9986ab47e6e3d99bfc6f8
>> # Parent b47fa638bfc7d6e5185f49113a6c418fbe6cf0b3
>> Add script to rewrite manifest to workaround lack of parent deltas.
>
> Other comments (sorry I should have batched the review).
No worries. I fixed all the join() calls to sjoin() as requested.
> If you want to use .pop() (I'm not sure it's worth it):
Not worth it IMHO. I believe that pop() is slightly faster than
pop(0), but who cares? Performance here is utterly totally
completely absolutely dominated by writerevs(). E.g. here's me
running shrink-manifest on our full repo converted from CVS:
"""
reading 108649 revs
.............................................................................................................
sorting ...
writing 108649 revs
.............................................................................................................
old file size: 3141546489 bytes (2996.0 MiB)
new file size: 59853864 bytes ( 57.1 MiB)
shrinkage: 98.1% (52.5x)
~/src/hg-crew/contrib/shrink-manifest.py 2859.49s user 149.52s system
98% cpu 51:05.36 total
"""
Reading and sorting took a few seconds, even less with a hot disk
cache. Writing the sorted manifest took almost an hour. So who the
heck cares about performance of reading and sorting?
>> + indexfn = repo.join('store/00manifest.i')
>> + datafn = indexfn[:-2] + '.d'
>> + if not os.path.exists(datafn):
>> + sys.exit('error: %s does not exist: manifest not big enough '
>> + 'to be worth shrinking' % datafn)
>> +
>
> I've found the warning annoying so I changed it like this:
>
>> + oldindexfn = indexfn + '.old'
>> + olddatafn = datafn + '.old'
>
> oldindexfn = tempfile.mktemp(dir=repo.sjoin(''),
> prefix='00manifest.',
> suffix='.i.old')
> olddatafn = oldindexfn[:-6] + '.d.old'
1) except tempfile.mktemp() is unsafe and should not be used
2) I don't understand how this avoids reading a 00manifest.d that does not exist
...so I'll resend without fixing this complaint yet. Please reply to
the next patch so we can sort this problem out.
>> + if os.path.exists(oldindexfn) or os.path.exists(olddatafn):
>> + sys.exit('error: one or both of %s or %s exists from a previous run; '
>> + 'please clean up before running again'
>> + % (oldindexfn, olddatafn))
>> +
>
> Final note: If someone wants to do that, the script could accept
> arbitrary revlogs as arguments, except the changelog (because it would
> need fixing the linkrevs).
Running this on an arbitrary revlog -- really, a filelog -- smells
like YAGNI to me. I mean, yeah, sure, lack of parent deltas can cause
any revlog to be larger than it should be. But most complaints are
about giant manifest files. That's why I called the script
shrink-manifest, and that's why it only shrinks
.hg/store/00manifest.*. We can refactor to support filelogs later if
it turns out to be necessary. (In which case it should probably be
renamed to shrink-revlog, which will make it less obvious to someone
wondering what to do about their giant manifest file.)
Also, the only use case I can think of for running shrink-manifest
twice in a row (i.e. when the *.old files already exist) is testing
that it makes no change on a second run. And that's really something
that only needs to be done by developers testing the script. So I'd
rather not try to accommodate that and risk removing someone's *.old
files accidentally.
OTOH, shrink-manifest should probably print a note saying "left behind
00manifest*.old; delete them when you are satisified it worked". I'll
add that.
Greg
More information about the Mercurial-devel
mailing list