Closing and/or dropping old branches at conversion

Greg Ward greg-hg at gerg.ca
Fri Apr 24 17:37:53 CDT 2009


Hi all --

one problem with converting any fair-sized CVS repository is the
number of old branches.  For example, my current conversion yields a
Mercurial repo with 149 named branches and 140 active heads.  (I'm
doing OK at recognizing CVS merge commits and turning them into merge
changesets, at least for our recent release branches -- that's the
only reason I don't have 149 active heads.)

Of course, most of those "active" heads have not seen any activity in
years.  Either they were merged to trunk, in which case the history on
the trunk is fine, or they were never merged, in which case -- who
cares?  If a 6-year-old devel branch never got merged, I can assume no
one cares about it.

Sneaking merges into the conversion process is already handled by
convert.cvsps.mergefrom and splicemap.  So that's not what I'm here to
talk about.

Instead, I'd like a way to mark certain branches closed, and *maybe*
I'd like to simply drop certain branches from the conversion.  I have
already implemented "close on convert", and it was surprisingly easy.
For your amusement, here is the patch (NOT for pushing yet: this is
just for feedback):

# HG changeset patch
# User Greg Ward <greg-hg at gerg.ca>
# Date 1240610705 14400
# Node ID ed3294b216b728c801999ada39ea7c72849b7eae
# Parent  d153825fa665ee4fe02e0c2a8c91bffd4c9a365b
convert: make it possible to close branches as part of conversion.

User specifies which branches to close by passing a Python source file
containing a function oldbranch() to new option --oldbranch.  If
oldbranch() returns true for a particular source branch, then that
branch's head will be closed in the Mercurial target.

diff --git a/hgext/convert/__init__.py b/hgext/convert/__init__.py
--- a/hgext/convert/__init__.py
+++ b/hgext/convert/__init__.py
@@ -85,6 +85,12 @@
     (in either the source or destination revision control system) that
     should be used as the new parents for that node.

+    'oldbranch' is a file containing a Python function 'oldbranch()'
+    that is used to determine what source branches are \"old\" and
+    should therefore be marked closed in the target.  'oldbranch()'
+    takes the branch name (a string) and should return True for branches
+    to close.
+
     Mercurial Source
     -----------------

@@ -231,6 +237,7 @@
           ('r', 'rev', '', _('import up to target revision REV')),
           ('s', 'source-type', '', _('source repository type')),
           ('', 'splicemap', '', _('splice synthesized history into place')),
+          ('', 'oldbranch', '', _('specify old branches to close')),
           ('', 'datesort', None, _('try to sort changesets by date'))],
          _('hg convert [OPTION]... SOURCE [DEST [REVMAP]]')),
     "debugsvnlog":
diff --git a/hgext/convert/convcmd.py b/hgext/convert/convcmd.py
--- a/hgext/convert/convcmd.py
+++ b/hgext/convert/convcmd.py
@@ -92,10 +92,27 @@

         self.splicemap = mapfile(ui, opts.get('splicemap'))

+        oldbranchfile = opts.get('oldbranch')
+        if oldbranchfile:
+            g = {}
+            try:
+                execfile(oldbranchfile, g)
+            except Exception, err:
+                raise util.Abort(_('error reading oldbranch file \'%s\': %s')
+                                 % (oldbranchfile, err))
+            try:
+                self.oldbranch = g['oldbranch']
+            except KeyError:
+                raise util.Abort(_('oldbranch file \'%s\' contains '
+                                   'no function \'oldbranch()\'')
+                                 % oldbranchfile)
+        else:
+            self.oldbranch = None
+
     def walktree(self, heads):
         '''Return a mapping that identifies the uncommitted parents of every
         uncommitted changeset.'''
-        visit = heads
+        visit = heads[:]
         known = {}
         parents = {}
         while visit:
@@ -110,6 +127,21 @@

         return parents

+    def closeheads(self, heads):
+        '''Close old branches (heads that self.oldbranch(), which is a
+        user-supplied bit of code, says are old).'''
+        if self.oldbranch is None:
+            return
+
+        # N.B. don't need to worry about branches that are fully merged:
+        # they're not heads, so won't even be considered here.
+        for i in heads:
+            commit = self.commitcache[i]
+            if self.oldbranch(commit.branch):
+                self.ui.note("closing branch %r (commit %r)\n"
+                             % (commit.branch, i))
+                commit.extra['close'] = 1
+
     def toposort(self, parents):
         '''Return an ordering such that every uncommitted changeset is
         preceeded by all its uncommitted ancestors.'''
@@ -282,6 +314,7 @@
             heads = self.source.getheads()
             self.ui.debug("heads = %s\n" % ", ".join(heads))
             parents = self.walktree(heads)
+            self.closeheads(heads)
             self.ui.status(_("sorting...\n"))
             t = self.toposort(parents)
             num = len(t)

Note that rather than invent a file format for specifying "old"
branches, I let the user supply code instead.  This works very nicely
for me; all I have to do is pass this source file to convert
--oldbranch:

"""
import re

def oldbranch(branch):
    return not re.match(r'^RELEASE-3-[56789]-\d+', branch)
"""
and all of our recent release branches will be left open; everything
else will be marked closed.  Works for me.

So...

* am I breaking an unwritten rule by introducing a custom Python
source file to the conversion process? or is everyone OK with that?

* is the API "oldbranch(branch)" too simple?  it has occurred to me
that "oldbranch(branch, root, head)", where 'root' is the commit
object for the first commit on the branch and 'head' is the head of
the branch, would be more general.  (E.g. what if you want to evaluate
branches by date, or look at the log messages, or the authors, ...)
Anything else that I should throw in there?

* if I add a similar mechanism for dropping branches from the
conversion, I think they could be unified: e.g. the caller can just
supply a single function, e.g. 'branchfate(branch, root, head)' that
returns "drop", "close", or any other value to keep the branch.  Your
thoughts?

Like I said: I'm looking for feedback, especially on the proposed
interface.  The implementation is pretty trivial, although I think
dropping branches will be a bit more work.

Thanks --

Greg


More information about the Mercurial-devel mailing list