[PATCH STABLE?] commit: avoid a dirstate race with multiple commits in the same process
Adrian Buehlmann
adrian at cadifra.com
Tue Mar 15 19:58:24 CDT 2011
On 2011-03-14 19:48, Greg Ward wrote:
> # HG changeset patch
> # User Greg Ward <greg-hg at gerg.ca>
> # Date 1300128285 14400
> # Branch stable
> # Node ID 396982a5883383fc043d3f5ad78a0f9f0366bd06
> # Parent 4bfff063aed6fb4b3c8abe8a9ec51fe1e55725bf
> commit: avoid a dirstate race with multiple commits in the same process
> (issue2264, issue2516)
>
> The race happens when two commits in a row change the same file but do
> not change its size, *if* those two commits happen in the same second
> in the same process while holding the same repo lock. For example:
>
> commit i:
> M a
> M b
> commit i+1: # same process, same second, same repo lock
> M b # modify b without changing its size
> M c
>
> This first manifested in transplant, which is the most common way to
> do multiple commits in the same process. But it can manifest in any
> script or extension that does multiple commits under the same repo
> lock.
>
> The problem was that dirstate failed to notice the changes to b when
> localrepo is doing the second commit, meaning that change gets left in
> the working directory. In the context of transplant, that means either
> a crash ("RuntimeError: nothing committed after transplant") or a
> silently inaccurate transplant, depending on whether any other files
> were modified by the second transplanted changeset.
>
> The fix is to work a little harder in the second (and subsequent)
> commits run in the same process:
> - dirstate: factor out maybelookup() from write()
> - localrepo: add _lastcommitfiles attribute, and use it with
> maybelookup() to force repo.status() to look harder at files
> added/modified by the *previous* commit
>
> Incidentally, there is a simpler fix: call dirstate.normallookup() on
> every file modified by commit() at the end of the commit. The trouble
> with that solution is that it imposes a performance penalty on the
> common case: it means the next status-dependent hg command after every
> "hg commit" will be a little bit slower. The patch here is more
> complex, but it only affects performance for the uncommon case of
> multiple commits in the same process.
This looks like a very careful crafted text.
I haven't yet groked all this in detail, so please forgive me for some
initial potentially stupid remarks below.
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -10,6 +10,7 @@
> import util, ignore, osutil, parsers, encoding
> import struct, os, stat, errno
> import cStringIO
> +import tempfile
>
> _format = ">cllll"
> propertycache = util.propertycache
> @@ -391,6 +392,36 @@
> self._pl = (parent, nullid)
> self._dirty = True
>
> + def getfstime(self, dir):
> + '''Return the current time to filesystem resolution.'''
I somehow have a hard time with the grammar and semantic in this doc
sentence. Maybe:
'''Return the filesystem's notion of "now", rounded down
to the dirstate's one second resolution'''
> + (fd, fn) = tempfile.mkstemp(dir=dir)
> + try:
> + return int(os.fstat(fd).st_mtime)
> + finally:
> + os.close(fd)
> + os.unlink(fn)
[..]
> +
> + def maybelookup(self, f, now):
I'm still pondering about a better name for this function. Proposals for
bikeshedding follow below:
def validatemtime(self, f, now):
> + '''Examine f to determine if it needs more work to determine its
> + true status, or whether it can be considered normal. If more
> + work needed, set the in-memory state to lookup; otherwise, leave
> + it alone. Thus, maybelookup() can affect the next call to
> + status(). now must be the current time to filesystem resolution
> + (see getfstime()).'''
I think something like
'''Examine the entry for file f and invalidate it, if its
mtime is too close to "now" to be useful.'''
should be enough together with (<cough>) my preexisting fat comment inside.
> + e = self._map[f]
> + if e[0] == 'n' and e[3] == now:
> + # The file was last modified "simultaneously" to 'now'
> + # (i.e. within the same second for filesystems with a
> + # granularity of 1 sec). This commonly happens for at least
> + # a couple of files on 'update': the user could then change
> + # the file without changing its size within the same
> + # second. Invalidate the file's stat data in dirstate,
> + # forcing future 'status' calls to compare the contents of
> + # the file. This prevents mistakenly treating such files as
> + # clean.
> + self._map[f] = e = (e[0], 0, -1, -1) # mark entry as 'unset'
> + return e
> +
> def write(self):
> if not self._dirty:
> return
> @@ -405,20 +436,8 @@
> pack = struct.pack
> write = cs.write
> write("".join(self._pl))
> - for f, e in self._map.iteritems():
> - if e[0] == 'n' and e[3] == now:
> - # The file was last modified "simultaneously" with the current
> - # write to dirstate (i.e. within the same second for file-
> - # systems with a granularity of 1 sec). This commonly happens
> - # for at least a couple of files on 'update'.
> - # The user could change the file without changing its size
> - # within the same second. Invalidate the file's stat data in
> - # dirstate, forcing future 'status' calls to compare the
> - # contents of the file. This prevents mistakenly treating such
> - # files as clean.
> - e = (e[0], 0, -1, -1) # mark entry as 'unset'
> - self._map[f] = e
> -
> + for f in self._map.iterkeys():
> + e = self.maybelookup(f, now)
> if f in copymap:
> f = "%s\0%s" % (f, copymap[f])
> e = pack(_format, e[0], e[1], e[2], e[3], len(f))
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
[sorry, haven't looked at the rest yet. I'm such a lame duck].
More information about the Mercurial-devel
mailing list