[PATCH STABLE?] commit: avoid a dirstate race with multiple commits in the same process

Greg Ward greg-hg at gerg.ca
Mon Mar 14 13:48:24 CDT 2011


# HG changeset patch
# User Greg Ward <greg-hg at gerg.ca>
# Date 1300128285 14400
# Branch stable
# Node ID 396982a5883383fc043d3f5ad78a0f9f0366bd06
# Parent  4bfff063aed6fb4b3c8abe8a9ec51fe1e55725bf
commit: avoid a dirstate race with multiple commits in the same process
(issue2264, issue2516)

The race happens when two commits in a row change the same file but do
not change its size, *if* those two commits happen in the same second
in the same process while holding the same repo lock.  For example:

  commit i:
    M a
    M b
  commit i+1:         # same process, same second, same repo lock
    M b               # modify b without changing its size
    M c

This first manifested in transplant, which is the most common way to
do multiple commits in the same process. But it can manifest in any
script or extension that does multiple commits under the same repo
lock.

The problem was that dirstate failed to notice the changes to b when
localrepo is doing the second commit, meaning that change gets left in
the working directory. In the context of transplant, that means either
a crash ("RuntimeError: nothing committed after transplant") or a
silently inaccurate transplant, depending on whether any other files
were modified by the second transplanted changeset.

The fix is to work a little harder in the second (and subsequent)
commits run in the same process:
- dirstate: factor out maybelookup() from write()
- localrepo: add _lastcommitfiles attribute, and use it with
  maybelookup() to force repo.status() to look harder at files
  added/modified by the *previous* commit

Incidentally, there is a simpler fix: call dirstate.normallookup() on
every file modified by commit() at the end of the commit.  The trouble
with that solution is that it imposes a performance penalty on the
common case: it means the next status-dependent hg command after every
"hg commit" will be a little bit slower.  The patch here is more
complex, but it only affects performance for the uncommon case of
multiple commits in the same process.

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -10,6 +10,7 @@
 import util, ignore, osutil, parsers, encoding
 import struct, os, stat, errno
 import cStringIO
+import tempfile
 
 _format = ">cllll"
 propertycache = util.propertycache
@@ -391,6 +392,36 @@
         self._pl = (parent, nullid)
         self._dirty = True
 
+    def getfstime(self, dir):
+        '''Return the current time to filesystem resolution.'''
+        (fd, fn) = tempfile.mkstemp(dir=dir)
+        try:
+            return int(os.fstat(fd).st_mtime)
+        finally:
+            os.close(fd)
+            os.unlink(fn)
+
+    def maybelookup(self, f, now):
+        '''Examine f to determine if it needs more work to determine its
+        true status, or whether it can be considered normal.  If more
+        work needed, set the in-memory state to lookup; otherwise, leave
+        it alone. Thus, maybelookup() can affect the next call to
+        status(). now must be the current time to filesystem resolution
+        (see getfstime()).'''
+        e = self._map[f]
+        if e[0] == 'n' and e[3] == now:
+            # The file was last modified "simultaneously" to 'now'
+            # (i.e. within the same second for filesystems with a
+            # granularity of 1 sec). This commonly happens for at least
+            # a couple of files on 'update': the user could then change
+            # the file without changing its size within the same
+            # second. Invalidate the file's stat data in dirstate,
+            # forcing future 'status' calls to compare the contents of
+            # the file. This prevents mistakenly treating such files as
+            # clean.
+            self._map[f] = e = (e[0], 0, -1, -1)   # mark entry as 'unset'
+        return e
+
     def write(self):
         if not self._dirty:
             return
@@ -405,20 +436,8 @@
         pack = struct.pack
         write = cs.write
         write("".join(self._pl))
-        for f, e in self._map.iteritems():
-            if e[0] == 'n' and e[3] == now:
-                # The file was last modified "simultaneously" with the current
-                # write to dirstate (i.e. within the same second for file-
-                # systems with a granularity of 1 sec). This commonly happens
-                # for at least a couple of files on 'update'.
-                # The user could change the file without changing its size
-                # within the same second. Invalidate the file's stat data in
-                # dirstate, forcing future 'status' calls to compare the
-                # contents of the file. This prevents mistakenly treating such
-                # files as clean.
-                e = (e[0], 0, -1, -1)   # mark entry as 'unset'
-                self._map[f] = e
-
+        for f in self._map.iterkeys():
+            e = self.maybelookup(f, now)
             if f in copymap:
                 f = "%s\0%s" % (f, copymap[f])
             e = pack(_format, e[0], e[1], e[2], e[3], len(f))
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -113,6 +113,11 @@
         self._datafilters = {}
         self._transref = self._lockref = self._wlockref = None
 
+        # List of files added/modified by the previous commit.  Needed
+        # to avoid a dirstate race when we do two commits in the same
+        # second in the same process while holding the repo lock.
+        self._lastcommitfiles = None
+
     def _applyrequirements(self, requirements):
         self.requirements = requirements
         self.sopener.options = {}
@@ -922,6 +927,14 @@
                 raise util.Abort(_('cannot partially commit a merge '
                                    '(do not specify files or patterns)'))
 
+            if self._lastcommitfiles:
+                files = [f for f in self._lastcommitfiles
+                         if f in self.dirstate]
+                if files:
+                    now = self.dirstate.getfstime(self.path)
+                    for f in files:
+                        self.dirstate.maybelookup(f, now)
+
             changes = self.status(match=match, clean=force)
             if force:
                 changes[0].extend(changes[6]) # mq may commit unchanged files
@@ -1020,6 +1033,7 @@
             bookmarks.update(self, parents, ret)
             for f in changes[0] + changes[1]:
                 self.dirstate.normal(f)
+            self._lastcommitfiles = changes[0] + changes[1]
             for f in changes[2]:
                 self.dirstate.forget(f)
             self.dirstate.setparents(ret)
diff --git a/tests/test-commit-multiple.t b/tests/test-commit-multiple.t
new file mode 100644
--- /dev/null
+++ b/tests/test-commit-multiple.t
@@ -0,0 +1,116 @@
+# reproduce issue2264, issue2516 (thanks to issue2516 for the original
+# script)
+
+create test repo
+  $ cat <<EOF >> $HGRCPATH
+  > [extensions]
+  > transplant =
+  > graphlog =
+  > EOF
+  $ hg init repo
+  $ cd repo
+  $ template="{rev}  {desc|firstline}  [{branches}]\n"
+
+# we need to start out with two changesets on the default branch
+# in order to avoid the cute little optimization where transplant
+# pulls rather than transplants
+add initial changesets
+  $ echo feature1 > file1
+  $ hg ci -Am"feature 1"
+  adding file1
+  $ echo feature2 >> file2
+  $ hg ci -Am"feature 2"
+  adding file2
+
+# The changes to 'bugfix' are enough to show the bug: in fact, with only
+# those changes, it's a very noisy crash ("RuntimeError: nothing
+# committed after transplant").  But if we modify a second file in the
+# transplanted changesets, the bug is much more subtle: transplant
+# silently drops the second change to 'bugfix' on the floor, and we only
+# see it when we run 'hg status' after transplanting.  Subtle data loss
+# bugs are worse than crashes, so reproduce the subtle case here.
+commit bug fixes on bug fix branch
+  $ hg branch fixes
+  marked working directory as branch fixes
+  $ echo fix1 > bugfix
+  $ echo fix1 >> file1
+  $ hg ci -Am"fix 1"
+  adding bugfix
+  $ echo fix2 > bugfix
+  $ echo fix2 >> file1
+  $ hg ci -Am"fix 2"
+  $ hg glog --template="$template"
+  @  3  fix 2  [fixes]
+  |
+  o  2  fix 1  [fixes]
+  |
+  o  1  feature 2  []
+  |
+  o  0  feature 1  []
+  
+transplant bug fixes onto release branch
+  $ hg update 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+  $ hg branch release
+  marked working directory as branch release
+  $ hg transplant 2 3
+  applying [0-9a-f]{12} (re)
+  [0-9a-f]{12} transplanted to [0-9a-f]{12} (re)
+  applying [0-9a-f]{12} (re)
+  [0-9a-f]{12} transplanted to [0-9a-f]{12} (re)
+  $ hg glog --template="$template"
+  @  5  fix 2  [release]
+  |
+  o  4  fix 1  [release]
+  |
+  | o  3  fix 2  [fixes]
+  | |
+  | o  2  fix 1  [fixes]
+  | |
+  | o  1  feature 2  []
+  |/
+  o  0  feature 1  []
+  
+  $ hg status
+  $ hg status --rev 0:4
+  M file1
+  A bugfix
+  $ hg status --rev 4:5
+  M bugfix
+  M file1
+
+now test that we fixed the bug for all scripts/extensions
+  $ cat > $TESTTMP/committwice.py <<__EOF__
+  > from mercurial import ui, hg, match, node
+  > 
+  > def replacebyte(fn, b):
+  >     f = open("file1", "rb+")
+  >     f.seek(0, 0)
+  >     f.write(b)
+  >     f.close()
+  > 
+  > repo = hg.repository(ui.ui(), '.')
+  > assert len(repo) == 6, \
+  >        "initial: len(repo) == %d, expected 6" % len(repo)
+  > try:
+  >     wlock = repo.wlock()
+  >     lock = repo.lock()
+  >     m = match.exact(repo.root, '', ['file1'])
+  >     replacebyte("file1", "x")
+  >     n = repo.commit(text="x", user="test", date=(0, 0), match=m)
+  >     print "commit 1: len(repo) == %d" % len(repo)
+  >     replacebyte("file1", "y")
+  >     n = repo.commit(text="y", user="test", date=(0, 0), match=m)
+  >     print "commit 2: len(repo) == %d" % len(repo)
+  > finally:
+  >     lock.release()
+  >     wlock.release()
+  > __EOF__
+  $ $PYTHON $TESTTMP/committwice.py
+  commit 1: len(repo) == 7
+  commit 2: len(repo) == 8
+  $ hg status
+  $ hg log --template "{rev}  {desc}  {files}\n" -r5:
+  5  fix 2  bugfix file1
+  6  x  file1
+  7  y  file1


More information about the Mercurial-devel mailing list