[PATCH STABLE] convert/hg: update source changeset hashes in commit messages

Matt Harbison matt_harbison at yahoo.com
Sat Apr 28 02:07:15 CDT 2012


# HG changeset patch
# User Matt Harbison <matt_harbison at yahoo.com>
# Date 1335596635 14400
# Branch stable
# Node ID 1429268ed857b275844c8874c8a050f3fb29372b
# Parent  39d1f83eb05d424fc467d2e0eece8c4a8cefd35a
convert/hg: update source changeset hashes in commit messages

Any 12 character hex strings found in commit comments that are also
valid changeset hashes in the source repo are replaced in the comment
with the corresponding hashes of the destination repo.  It does not
affect the value stored by the config convert.hg.saverev=True option.

Since a valid source hash is required, this protects other 12
character hex strings that aren't hashes.  Note however that this
means a repo which was previously converted and had its hashes changed
will not be updated, because the hashes in the commit comments reflect
hashes that are no longer valid in the source.  A future improvement
might be to specify one or more of the previously generated shamap
files on the command line to fix up previously converted repos.

The number of 12 character (lowercase) hex strings is 16^12 or about
281.5 trillion, so given the unlikelihood of a random string matching
a real hash (on top of however unlikely a 12 character hex string in
a message is anyway), this seems safe to do without having to opt in
with another command line switch.

Previously, while the tags file was updated to reflect the hashes in
the destination repo, hashes in the commit comments (generated by the
tag command or the user) were not.  Some clients like thg will create
a hyperlink when it sees a valid hash, so keeping the hashes up to
date helps with navigating the repository.

diff --git a/hgext/convert/hg.py b/hgext/convert/hg.py
--- a/hgext/convert/hg.py
+++ b/hgext/convert/hg.py
@@ -18,7 +18,7 @@
 #   source.
 
 
-import os, time, cStringIO
+import os, time, cStringIO, re
 from mercurial.i18n import _
 from mercurial.node import bin, hex, nullid
 from mercurial import hg, util, context, bookmarks, error
@@ -32,6 +32,8 @@
         self.clonebranches = ui.configbool('convert', 'hg.clonebranches', False)
         self.tagsbranch = ui.config('convert', 'hg.tagsbranch', 'default')
         self.lastbranch = None
+        self.hashregex  = None
+
         if os.path.isdir(path) and len(os.listdir(path)) > 0:
             try:
                 self.repo = hg.repository(self.ui, path)
@@ -157,6 +159,19 @@
         p2 = parents.pop(0)
 
         text = commit.desc
+
+        if revmap:
+            if self.hashregex is None:
+                self.hashregex = re.compile(r'\b[0-9a-f]{12}\b')
+
+            for m in self.hashregex.finditer(text):
+                srcrev = m.group()
+                dstrev = revmap.get(source.lookuprev(srcrev))
+
+                if dstrev:
+                    dstrev = dstrev[0:12]
+                    text = text[:m.start()] + dstrev + text[m.end():]
+
         extra = commit.extra.copy()
         if self.branchnames and commit.branch:
             extra['branch'] = commit.branch
diff --git a/tests/test-convert-hg-sink.t b/tests/test-convert-hg-sink.t
--- a/tests/test-convert-hg-sink.t
+++ b/tests/test-convert-hg-sink.t
@@ -100,6 +100,22 @@
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
   $ hg debugrename baz
   baz not renamed
+
+
+Updating messages with a single changeset hash will be tested below with the
+tags, so test 12 character hex strings that shouldn't be converted too.  Also
+put more than one in that should be changed to make sure all are converted.
+  $ cd ../orig
+  $ cat > commitmsg.txt <<EOF
+  > test hash updates (or not) in commit messages
+  > This should not get updated 123456789abc (bogus source)
+  > Nor this 5aff986889f00 (real, with trailing 0 - longer than short hash)
+  > This 327daa9251fa (first commit) and this 593cbf6fb2b4 (add some-tag) should
+  > change
+  > EOF
+  $ hg add commitmsg.txt
+  $ hg ci -l commitmsg.txt
+
   $ cd ..
 
 test tag rewriting
@@ -112,13 +128,59 @@
   scanning source...
   sorting...
   converting...
-  4 add foo and bar
-  3 remove foo
-  2 add foo/file
-  1 Added tag some-tag for changeset ad681a868e44
-  0 add baz
+  5 add foo and bar
+  4 remove foo
+  3 add foo/file
+  2 Added tag some-tag for changeset ad681a868e44
+  1 add baz
+  0 test hash updates (or not) in commit messages
   $ cd new-filemap
   $ hg tags
-  tip                                2:6f4fd1df87fb
+  tip                                3:03f5d0928e2c
   some-tag                           0:ba8636729451
   $ cd ..
+
+the original repo log
+  $ hg -R orig log --template "changeset:\t{node|short}\n{desc}\n\n"
+  changeset:	5da832893a52
+  test hash updates (or not) in commit messages
+  This should not get updated 123456789abc (bogus source)
+  Nor this 5aff986889f00 (real, with trailing 0 - longer than short hash)
+  This 327daa9251fa (first commit) and this 593cbf6fb2b4 (add some-tag) should
+  change
+  
+  changeset:	5aff986889f0
+  add baz
+  
+  changeset:	593cbf6fb2b4
+  Added tag some-tag for changeset ad681a868e44
+  
+  changeset:	ad681a868e44
+  add foo/file
+  
+  changeset:	cbba8ecc03b7
+  remove foo
+  
+  changeset:	327daa9251fa
+  add foo and bar
+  
+
+the converted repo log (the filemap removed 'foo' changesets)
+  $ hg -R new-filemap log --template "changeset:\t{node|short}\n{desc}\n\n"
+  changeset:	03f5d0928e2c
+  test hash updates (or not) in commit messages
+  This should not get updated 123456789abc (bogus source)
+  Nor this 5aff986889f00 (real, with trailing 0 - longer than short hash)
+  This ba8636729451 (first commit) and this 72535aa3aea8 (add some-tag) should
+  change
+  
+  changeset:	3c74706b1ff8
+  add baz
+  
+  changeset:	72535aa3aea8
+  Added tag some-tag for changeset ba8636729451
+  
+  changeset:	ba8636729451
+  add foo and bar
+  
+


More information about the Mercurial-devel mailing list