Something changed shrink-revlog for the worse

Greg Ward greg at gerg.ca
Sun Jan 10 12:36:14 CST 2010


On Thu, Jan 7, 2010 at 4:28 PM, Greg Ward <greg at gerg.ca> wrote:
> Been digging into my problem with shrink-revlog some more.  Still no
> clear answers, but here is some more interesting data.

On the off chance that anyone is still reading my ramblings on this
matter, I have found the change that made my shrunken manifest jump
from 55 MB to 460 MB.  As I suspected, it wasn't a change to
Mercurial, but to my conversion scripts.  Specifically, adding dummy
merges of ~30 old CVS development branches and release branches did
it.  (I already had fairly aggressive merge detection as part of the
conversion; this change was intended to handle the various branches
that were not handled by it.)

This only added 30 nodes to a graph of ~105,000.  But it tweaked the
topology of the graph just enough to make a big difference in
shrink-revlog's ability to shrink the manifest.  In hand-wavey terms,
those 30 dummy merges interrupt the flow of the trunk: when toposort()
hits a dummy merge, it has to stop, back up, and add the nodes on the
branch being merged in.  Then it can carry on writing trunk nodes.

The odd thing is that adding 30 dummy merge nodes caused the number of
suboptimal manifest nodes to jump from 1477 to 7731, which is largely
responsible for the shrunken manifest exploding from 55 MB to 460 MB.

Conclusion: there exist small perturbations to the source revlog that
cause large changes in the size of shrink-revlog's output.  I'm going
to go spend some pencil-and-paper time now to see if there exists a
tweak to the toposort() algorithm that does not exhibit this
behaviour.

Greg


More information about the Mercurial-devel mailing list