[PATCH V6] dirstate: avoid a race with multiple commits in the same process

Adrian Buehlmann adrian at cadifra.com
Mon Mar 21 06:53:29 CDT 2011


On 2011-03-20 22:47, Greg Ward wrote:
> # HG changeset patch
> # User Greg Ward <greg at gerg.ca>
> # Date 1300657269 14400
> # Node ID e750eeba9d354ae06ce260e7cca658a724027823
> # Parent  48d606d7192b6f8f4fc8071fccdc5a83843ab2d3
> dirstate: avoid a race with multiple commits in the same process
> (issue2264, issue2516)
> 
> The race happens when two commits in a row change the same file
> without changing its size, *if* those two commits happen in the same
> second in the same process while holding the same repo lock.  For
> example:
> 
>   commit 1:
>     M a
>     M b
>   commit 2:           # same process, same second, same repo lock
>     M b               # modify b without changing its size
>     M c
> 
> This first manifested in transplant, which is the most common way to
> do multiple commits in the same process. But it can manifest in any
> script or extension that does multiple commits under the same repo
> lock. (Thus, the test script tests both transplant and a custom script.)
> 
> The problem was that dirstate.status() failed to notice the change to
> b when localrepo is about to do the second commit, meaning that change
> gets left in the working directory. In the context of transplant, that
> means either a crash ("RuntimeError: nothing committed after
> transplant") or a silently inaccurate transplant, depending on whether
> any other files were modified by the second transplanted changeset.
> 
> The fix is to make status() work a little harder when we have
> previously marked files as clean (state 'normal') in the same process.
> Specifically, dirstate.normal() adds files to self._lastnormal, and
> other state-changing methods remove them. Then dirstate.status() puts
> any files in self._lastnormal into state 'lookup', which will make
> localrepository.status() read file contents to see if it has really
> changed.  So we pay a small performance penalty for the second (and
> subsequent) commits in the same process, without affecting the common
> case.  Anything that does lots of status updates and checks in the
> same process could suffer a performance hit.
> 
> Incidentally, there is a simpler fix: call dirstate.normallookup() on
> every file updated by commit() at the end of the commit.  The trouble
> with that solution is that it imposes a performance penalty on the
> common case: it means the next status-dependent hg command after every
> "hg commit" will be a little bit slower.  The patch here is more
> complex, but only affects performance for the uncommon case.
> 
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -49,6 +49,7 @@
>          self._rootdir = os.path.join(root, '')
>          self._dirty = False
>          self._dirtypl = False
> +        self._lastnormal = set()        # files believed to be normal
>          self._ui = ui
>  
>      @propertycache
> @@ -285,6 +286,12 @@

[so this is now

    def normal(self, f):
        '''Mark a file normal and clean.'''
        self._dirty = True
        self._addpath(f)
        s = os.lstat(self._join(f))
        self._map[f] = ('n', s.st_mode, s.st_size, int(s.st_mtime))
        if f in self._copymap:
            del self._copymap[f]

        # Right now, this file is clean: but if some code in this
        # process modifies it without changing its size before the clock
        # ticks over to the next second, then it won't be clean anymore.
        # So make sure that status() will look harder at it.
        self._lastnormal.add(f)

]

>          if f in self._copymap:
>              del self._copymap[f]
>  
> +        # Right now, this file is clean: but if some code in this
> +        # process modifies it without changing its size before the clock
> +        # ticks over to the next second, then it won't be clean anymore.
> +        # So make sure that status() will look harder at it.
> +        self._lastnormal.add(f)
> +

Hmmm.

local.status does a full compare for files that need to be looked into,
which IIUC now includes the files in self._lastnormal (see
dirstate.status() ).

localrepo.status calls self.dirstate.normal(f) for the files that
content-compare equal (list "fixup").

Maybe there should be a way to say "this file is *really* normal, we've
even compared file contents". I think such files shouldn't be added to
_lastnormal, so they are not content-compared again and again on every
localrepo.status() call.

?

>      def normallookup(self, f):
>          '''Mark a file normal, but possibly dirty.'''
>          if self._pl[1] != nullid and f in self._map:
> @@ -308,6 +315,7 @@
>          self._map[f] = ('n', 0, -1, -1)
>          if f in self._copymap:
>              del self._copymap[f]
> +        self._lastnormal.discard(f)
>  
>      def otherparent(self, f):
>          '''Mark as coming from the other parent, always dirty.'''
> @@ -319,6 +327,7 @@
>          self._map[f] = ('n', 0, -2, -1)
>          if f in self._copymap:
>              del self._copymap[f]
> +        self._lastnormal.discard(f)
>  
>      def add(self, f):
>          '''Mark a file added.'''
> @@ -327,6 +336,7 @@
>          self._map[f] = ('a', 0, -1, -1)
>          if f in self._copymap:
>              del self._copymap[f]
> +        self._lastnormal.discard(f)
>  
>      def remove(self, f):
>          '''Mark a file removed.'''
> @@ -343,6 +353,7 @@
>          self._map[f] = ('r', 0, size, 0)
>          if size == 0 and f in self._copymap:
>              del self._copymap[f]
> +        self._lastnormal.discard(f)
>  
>      def merge(self, f):
>          '''Mark a file merged.'''
> @@ -352,6 +363,7 @@
>          self._map[f] = ('m', s.st_mode, s.st_size, int(s.st_mtime))
>          if f in self._copymap:
>              del self._copymap[f]
> +        self._lastnormal.discard(f)
>  
>      def forget(self, f):
>          '''Forget a file.'''
> @@ -361,6 +373,7 @@
>              del self._map[f]
>          except KeyError:
>              self._ui.warn(_("not in dirstate: %s\n") % f)
> +        self._lastnormal.discard(f)
>  
>      def _normalize(self, path, knownpath):
>          norm_path = os.path.normcase(path)
> @@ -640,6 +653,7 @@
>          radd = removed.append
>          dadd = deleted.append
>          cadd = clean.append
> +        lastnormal = self._lastnormal.__contains__
>  
>          lnkkind = stat.S_IFLNK
>  
> @@ -672,6 +686,18 @@
>                  elif (time != int(st.st_mtime)
>                        and (mode & lnkkind != lnkkind or self._checklink)):
>                      ladd(fn)
> +                elif lastnormal(fn):
> +                    # If previously in this process we recorded that
> +                    # this file is clean, think twice: intervening code
> +                    # may have modified the file in the same second
> +                    # without changing its size. So force caller to
> +                    # check file contents. Because we're not updating
> +                    # self._map, this only affects the current process.
> +                    # That should be OK because this mainly affects
> +                    # multiple commits in the same process, and each
> +                    # commit by definition makes the committed files
> +                    # clean.
> +                    ladd(fn)
>                  elif listclean:
>                      cadd(fn)
>              elif state == 'm':


More information about the Mercurial-devel mailing list