Something changed shrink-revlog for the worse
Greg Ward
greg at gerg.ca
Thu Jan 7 08:08:36 CST 2010
On Thu, Jan 7, 2010 at 8:40 AM, Dirkjan Ochtman > I tried to verify
your results. First, I checked two version of the
> shrink-script. There was a change from Benoit early in December in the
> shrink script (you didn't mention it anywhere). I think it's not in
> 1.4.x, though:
>
> changeset: 10009:69dca8574a6a
> user: Benoit Boissinot <benoit.boissinot at ens-lyon.org>
> date: Fri Dec 04 15:36:13 2009 +0100
> summary: shrink-revlog: improve performance: use changegroup
> instead of revisions
Oops: I've been using shrink-revlog from an hg-crew clone, but it
looks like I forgot to pull for a long time. So I have not been
running any conversions/shrinks with that changeset, meaning it can't
be the culprit.
> I tried all of these on an old conversion of python to hg I had lying
> around. It goes from 872.2 MiB to 14.8 MiB, so it may not be quite as
> branchy as your work repo, but at least there's significant reduction.
No, our work repo is really quite amazingly branchy. It's astounding
that we've kept lurching along on top of CVS for as long as we have.
> I didn't check time running, though I did get the impression the new
> version was a little snappier.
Good! It takes over an hour to shrink my manifest right now. And
that's after a 12-13 hour conversion from CVS.
> After verifying that the pre-69dca8 version and the post-69dca8
> version gave the same on-disk improvement, I tried running the latest
> version of the script against three versions of Mercurial. I inserted
> a sys.path insertion at the top of the shrink script and print
> changegroup.__file__ to be sure it got the right path. I had three
> steps between each run: revert manifests to the old versions (simply
> moving .old back), updating my crew clone with up -C, then make local
> in my crew clone.
>
> Unfortunately, I was unable to reproduce your results: each run gave
> me exactly the same resulting repo size.
Thank you for doing that: it confirms a similar (but less extensive)
test that I did yesterday, which also led me to believe that
underlying Mercurial changes have *not* directly affected the output
of shrink-revlog.
I think that leaves 3 possibilities, in order of increasing probability:
1) changes in the data
1a) someone added something to our CVS repository that magically
changed the past history of the Mercurial manifest log
1b) I modified the input to cvs2hg (what subset of our CVS repo) that
affected the past history of the manifest log
2) changes to Mercurial (1.4 vs 1.3) that affect the nature of the
manifest log produced by cvs2hg, resulting in much worse results
from shrink-revlog
3) changes to my conversion scripts that similarly affect the nature
of the manifest log and thus the result of shrink-revlog
The main reason I posted here was to see if anyone could think of any
evidence for hypothesis #2.
Considering that a full conversion+shrink takes around 14-15 hours, I
guess I'd better see if I can reproduce this with converting only a
couple thousand changesets. (I already tried it with a much smaller
subset of our CVS repository, and could not reproduce the poor
shrink-revlog result.)
Thanks!
Greg
More information about the Mercurial-devel
mailing list