[PATCH V6] dirstate: avoid a race with multiple commits in the same process
Adrian Buehlmann
adrian at cadifra.com
Mon Mar 21 06:53:29 CDT 2011
On 2011-03-20 22:47, Greg Ward wrote:
> # HG changeset patch
> # User Greg Ward <greg at gerg.ca>
> # Date 1300657269 14400
> # Node ID e750eeba9d354ae06ce260e7cca658a724027823
> # Parent 48d606d7192b6f8f4fc8071fccdc5a83843ab2d3
> dirstate: avoid a race with multiple commits in the same process
> (issue2264, issue2516)
>
> The race happens when two commits in a row change the same file
> without changing its size, *if* those two commits happen in the same
> second in the same process while holding the same repo lock. For
> example:
>
> commit 1:
> M a
> M b
> commit 2: # same process, same second, same repo lock
> M b # modify b without changing its size
> M c
>
> This first manifested in transplant, which is the most common way to
> do multiple commits in the same process. But it can manifest in any
> script or extension that does multiple commits under the same repo
> lock. (Thus, the test script tests both transplant and a custom script.)
>
> The problem was that dirstate.status() failed to notice the change to
> b when localrepo is about to do the second commit, meaning that change
> gets left in the working directory. In the context of transplant, that
> means either a crash ("RuntimeError: nothing committed after
> transplant") or a silently inaccurate transplant, depending on whether
> any other files were modified by the second transplanted changeset.
>
> The fix is to make status() work a little harder when we have
> previously marked files as clean (state 'normal') in the same process.
> Specifically, dirstate.normal() adds files to self._lastnormal, and
> other state-changing methods remove them. Then dirstate.status() puts
> any files in self._lastnormal into state 'lookup', which will make
> localrepository.status() read file contents to see if it has really
> changed. So we pay a small performance penalty for the second (and
> subsequent) commits in the same process, without affecting the common
> case. Anything that does lots of status updates and checks in the
> same process could suffer a performance hit.
>
> Incidentally, there is a simpler fix: call dirstate.normallookup() on
> every file updated by commit() at the end of the commit. The trouble
> with that solution is that it imposes a performance penalty on the
> common case: it means the next status-dependent hg command after every
> "hg commit" will be a little bit slower. The patch here is more
> complex, but only affects performance for the uncommon case.
>
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -49,6 +49,7 @@
> self._rootdir = os.path.join(root, '')
> self._dirty = False
> self._dirtypl = False
> + self._lastnormal = set() # files believed to be normal
> self._ui = ui
>
> @propertycache
> @@ -285,6 +286,12 @@
[so this is now
def normal(self, f):
'''Mark a file normal and clean.'''
self._dirty = True
self._addpath(f)
s = os.lstat(self._join(f))
self._map[f] = ('n', s.st_mode, s.st_size, int(s.st_mtime))
if f in self._copymap:
del self._copymap[f]
# Right now, this file is clean: but if some code in this
# process modifies it without changing its size before the clock
# ticks over to the next second, then it won't be clean anymore.
# So make sure that status() will look harder at it.
self._lastnormal.add(f)
]
> if f in self._copymap:
> del self._copymap[f]
>
> + # Right now, this file is clean: but if some code in this
> + # process modifies it without changing its size before the clock
> + # ticks over to the next second, then it won't be clean anymore.
> + # So make sure that status() will look harder at it.
> + self._lastnormal.add(f)
> +
Hmmm.
local.status does a full compare for files that need to be looked into,
which IIUC now includes the files in self._lastnormal (see
dirstate.status() ).
localrepo.status calls self.dirstate.normal(f) for the files that
content-compare equal (list "fixup").
Maybe there should be a way to say "this file is *really* normal, we've
even compared file contents". I think such files shouldn't be added to
_lastnormal, so they are not content-compared again and again on every
localrepo.status() call.
?
> def normallookup(self, f):
> '''Mark a file normal, but possibly dirty.'''
> if self._pl[1] != nullid and f in self._map:
> @@ -308,6 +315,7 @@
> self._map[f] = ('n', 0, -1, -1)
> if f in self._copymap:
> del self._copymap[f]
> + self._lastnormal.discard(f)
>
> def otherparent(self, f):
> '''Mark as coming from the other parent, always dirty.'''
> @@ -319,6 +327,7 @@
> self._map[f] = ('n', 0, -2, -1)
> if f in self._copymap:
> del self._copymap[f]
> + self._lastnormal.discard(f)
>
> def add(self, f):
> '''Mark a file added.'''
> @@ -327,6 +336,7 @@
> self._map[f] = ('a', 0, -1, -1)
> if f in self._copymap:
> del self._copymap[f]
> + self._lastnormal.discard(f)
>
> def remove(self, f):
> '''Mark a file removed.'''
> @@ -343,6 +353,7 @@
> self._map[f] = ('r', 0, size, 0)
> if size == 0 and f in self._copymap:
> del self._copymap[f]
> + self._lastnormal.discard(f)
>
> def merge(self, f):
> '''Mark a file merged.'''
> @@ -352,6 +363,7 @@
> self._map[f] = ('m', s.st_mode, s.st_size, int(s.st_mtime))
> if f in self._copymap:
> del self._copymap[f]
> + self._lastnormal.discard(f)
>
> def forget(self, f):
> '''Forget a file.'''
> @@ -361,6 +373,7 @@
> del self._map[f]
> except KeyError:
> self._ui.warn(_("not in dirstate: %s\n") % f)
> + self._lastnormal.discard(f)
>
> def _normalize(self, path, knownpath):
> norm_path = os.path.normcase(path)
> @@ -640,6 +653,7 @@
> radd = removed.append
> dadd = deleted.append
> cadd = clean.append
> + lastnormal = self._lastnormal.__contains__
>
> lnkkind = stat.S_IFLNK
>
> @@ -672,6 +686,18 @@
> elif (time != int(st.st_mtime)
> and (mode & lnkkind != lnkkind or self._checklink)):
> ladd(fn)
> + elif lastnormal(fn):
> + # If previously in this process we recorded that
> + # this file is clean, think twice: intervening code
> + # may have modified the file in the same second
> + # without changing its size. So force caller to
> + # check file contents. Because we're not updating
> + # self._map, this only affects the current process.
> + # That should be OK because this mainly affects
> + # multiple commits in the same process, and each
> + # commit by definition makes the committed files
> + # clean.
> + ladd(fn)
> elif listclean:
> cadd(fn)
> elif state == 'm':
More information about the Mercurial-devel
mailing list