[PATCH STABLE?] commit: avoid a dirstate race with multiple commits in the same process

Adrian Buehlmann adrian at cadifra.com
Tue Mar 15 19:58:24 CDT 2011


On 2011-03-14 19:48, Greg Ward wrote:
> # HG changeset patch
> # User Greg Ward <greg-hg at gerg.ca>
> # Date 1300128285 14400
> # Branch stable
> # Node ID 396982a5883383fc043d3f5ad78a0f9f0366bd06
> # Parent  4bfff063aed6fb4b3c8abe8a9ec51fe1e55725bf
> commit: avoid a dirstate race with multiple commits in the same process
> (issue2264, issue2516)
> 
> The race happens when two commits in a row change the same file but do
> not change its size, *if* those two commits happen in the same second
> in the same process while holding the same repo lock.  For example:
> 
>   commit i:
>     M a
>     M b
>   commit i+1:         # same process, same second, same repo lock
>     M b               # modify b without changing its size
>     M c
> 
> This first manifested in transplant, which is the most common way to
> do multiple commits in the same process. But it can manifest in any
> script or extension that does multiple commits under the same repo
> lock.
> 
> The problem was that dirstate failed to notice the changes to b when
> localrepo is doing the second commit, meaning that change gets left in
> the working directory. In the context of transplant, that means either
> a crash ("RuntimeError: nothing committed after transplant") or a
> silently inaccurate transplant, depending on whether any other files
> were modified by the second transplanted changeset.
> 
> The fix is to work a little harder in the second (and subsequent)
> commits run in the same process:
> - dirstate: factor out maybelookup() from write()
> - localrepo: add _lastcommitfiles attribute, and use it with
>   maybelookup() to force repo.status() to look harder at files
>   added/modified by the *previous* commit
> 
> Incidentally, there is a simpler fix: call dirstate.normallookup() on
> every file modified by commit() at the end of the commit.  The trouble
> with that solution is that it imposes a performance penalty on the
> common case: it means the next status-dependent hg command after every
> "hg commit" will be a little bit slower.  The patch here is more
> complex, but it only affects performance for the uncommon case of
> multiple commits in the same process.

This looks like a very careful crafted text.

I haven't yet groked all this in detail, so please forgive me for some
initial potentially stupid remarks below.

> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -10,6 +10,7 @@
>  import util, ignore, osutil, parsers, encoding
>  import struct, os, stat, errno
>  import cStringIO
> +import tempfile
>  
>  _format = ">cllll"
>  propertycache = util.propertycache
> @@ -391,6 +392,36 @@
>          self._pl = (parent, nullid)
>          self._dirty = True
>  
> +    def getfstime(self, dir):
> +        '''Return the current time to filesystem resolution.'''

I somehow have a hard time with the grammar and semantic in this doc
sentence. Maybe:

           '''Return the filesystem's notion of "now", rounded down
           to the dirstate's one second resolution'''

> +        (fd, fn) = tempfile.mkstemp(dir=dir)
> +        try:
> +            return int(os.fstat(fd).st_mtime)
> +        finally:
> +            os.close(fd)
> +            os.unlink(fn)

[..]

> +
> +    def maybelookup(self, f, now):

I'm still pondering about a better name for this function. Proposals for
bikeshedding follow below:

       def validatemtime(self, f, now):

> +        '''Examine f to determine if it needs more work to determine its
> +        true status, or whether it can be considered normal.  If more
> +        work needed, set the in-memory state to lookup; otherwise, leave
> +        it alone. Thus, maybelookup() can affect the next call to
> +        status(). now must be the current time to filesystem resolution
> +        (see getfstime()).'''

I think something like

           '''Examine the entry for file f and invalidate it, if its
           mtime is too close to "now" to be useful.'''

should be enough together with (<cough>) my preexisting fat comment inside.

> +        e = self._map[f]
> +        if e[0] == 'n' and e[3] == now:
> +            # The file was last modified "simultaneously" to 'now'
> +            # (i.e. within the same second for filesystems with a
> +            # granularity of 1 sec). This commonly happens for at least
> +            # a couple of files on 'update': the user could then change
> +            # the file without changing its size within the same
> +            # second. Invalidate the file's stat data in dirstate,
> +            # forcing future 'status' calls to compare the contents of
> +            # the file. This prevents mistakenly treating such files as
> +            # clean.
> +            self._map[f] = e = (e[0], 0, -1, -1)   # mark entry as 'unset'
> +        return e
> +
>      def write(self):
>          if not self._dirty:
>              return
> @@ -405,20 +436,8 @@
>          pack = struct.pack
>          write = cs.write
>          write("".join(self._pl))
> -        for f, e in self._map.iteritems():
> -            if e[0] == 'n' and e[3] == now:
> -                # The file was last modified "simultaneously" with the current
> -                # write to dirstate (i.e. within the same second for file-
> -                # systems with a granularity of 1 sec). This commonly happens
> -                # for at least a couple of files on 'update'.
> -                # The user could change the file without changing its size
> -                # within the same second. Invalidate the file's stat data in
> -                # dirstate, forcing future 'status' calls to compare the
> -                # contents of the file. This prevents mistakenly treating such
> -                # files as clean.
> -                e = (e[0], 0, -1, -1)   # mark entry as 'unset'
> -                self._map[f] = e
> -
> +        for f in self._map.iterkeys():
> +            e = self.maybelookup(f, now)
>              if f in copymap:
>                  f = "%s\0%s" % (f, copymap[f])
>              e = pack(_format, e[0], e[1], e[2], e[3], len(f))
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py

[sorry, haven't looked at the rest yet. I'm such a lame duck].


More information about the Mercurial-devel mailing list