Something changed shrink-revlog for the worse

Greg Ward greg at gerg.ca
Mon Jan 11 20:52:51 CST 2010


On Mon, Jan 11, 2010 at 8:48 PM, Greg Ward <greg at gerg.ca> wrote:
> Hmmm.  I think my first patch to shrink-revlog will be to make it
> possible to select between different toposorts.  I think this
> discussion will be much easier if we can add toposorts at will and
> test them against each other in situ, rather than slinging patches
> around.  If a clear winner emerges, we can go back to having just one.
>  If not, we can leave the selectability in place.  Sound good?

That was pretty easy.  Here, let me know what you think:

"""
# HG changeset patch
# User Greg Ward <greg-hg at gerg.ca>
# Date 1263264321 18000
# Node ID 9198909623e0ecaa6f25a19f122b88ce3aaecf74
# Parent  eddcdaed1d6bdeec2c1f3bb7a5583ec3b978a07d
shrink: add --sort option for user-selectable toposort algorithm.

diff --git a/contrib/shrink-revlog.py b/contrib/shrink-revlog.py
--- a/contrib/shrink-revlog.py
+++ b/contrib/shrink-revlog.py
@@ -24,7 +24,7 @@
 from mercurial import ui as ui_, hg, revlog, transaction, node, util
 from mercurial import changegroup

-def toposort(ui, rl):
+def toposort_branchsort(ui, rl):

     children = {}
     root = []
@@ -123,9 +123,18 @@
     ui.write('shrinkage: %.1f%% (%.1fx)\n' % (shrink_percent, shrink_factor))

 def shrink(ui, repo, **opts):
+    """shrink a revlog by reordering revisions
+
+    Rewrites all the entries in some revlog of the current repository
+    (the manifest log by default) to save space.
+
+    Different sort algorithms have different performance
+    characteristics.  Use ``--sort`` to select a sort algorithm so you
+    can determine which works best for your data.  The default
+    algorithm, ``branchsort``, works well for workflows with lots of
+    active (unmerged) branches, but not so well when all branches have
+    been merged and there is only one repository head.
     """
-    Shrink revlog by re-ordering revisions. Will operate on manifest for
-    the given repository if no other revlog is specified."""

     # Unbuffer stdout for nice progress output.
     sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
@@ -147,6 +156,12 @@
             raise util.Abort('--revlog option must specify a revlog in %s, '
                              'not %s' % (store, indexfn))

+    sortname = opts['sort']
+    try:
+        toposort = globals()['toposort_' + sortname]
+    except KeyError:
+        raise util.Abort('no such toposort algorithm: %s' % sortname)
+
     datafn = indexfn[:-2] + '.d'
     if not os.path.exists(indexfn):
         raise util.Abort('no such file: %s' % indexfn)
@@ -220,7 +235,9 @@

 cmdtable = {
     'shrink': (shrink,
-               [('', 'revlog', '', 'index (.i) file of the revlog to shrink')],
+               [('', 'revlog', '', 'index (.i) file of the revlog to shrink'),
+                ('', 'sort', 'branchsort', 'name of sort algorithm to use'),
+               ],
"""

This is not quite ready to push, since this is part of a patch series
that depends on the "shrink-revlog: help/doc tweaks" patch I sent the
other day.  Dirkjan said he would trim that patch and push it, so I
will wait on him before sending all my current patches.  Still, does
this "shrink --sort" option seem sensible?  I know there is some
overlap with convert's --*sort options, but I'm not sure how to get
them back together again.  Ideas?

Greg


More information about the Mercurial-devel mailing list