[PATCH] dirstate: avoid a race with multiple commits in the same process

Greg Ward greg at gerg.ca
Fri Mar 18 21:30:01 CDT 2011


On 18 March 2011, Matt Mackall said:
> On Thu, 2011-03-17 at 21:37 -0400, Greg Ward wrote:
> >  
> > +            if lastnormal(fn):
> > +                # If previously in this process we decided that this
> > +                # file is clean (state 'normal'), think twice:
> > +                # intervening code may have modified the file in the
> > +                # same second without changing its size. So force caller
> > +                # to check file contents.
> > +                self.normallookup(fn)
> >              state, mode, size, time = dmap[fn]
> 
> This piece is suboptimal. We'll end up doing an expensive lookup on
> every file from the last checkin regardless of whether it visibly
> changed.
> 
> I think this wants to be right before 'elif clean' about 10 lines down.

Ahhhh, I see.  OK, here's what I've got right now; this is just the
fragment of patch in dirstate.status() (the rest is the same):

"""
@@ -640,6 +648,7 @@
         radd = removed.append
         dadd = deleted.append
         cadd = clean.append
+        lastnormal = self._lastnormal.__contains__
 
         lnkkind = stat.S_IFLNK
 
@@ -672,6 +681,14 @@
                 elif (time != int(st.st_mtime)
                       and (mode & lnkkind != lnkkind or self._checklink)):
                     ladd(fn)
+                elif lastnormal(fn):
+                    # If previously in this process we recorded that this
+                    # file is 'normal' (clean), think twice: intervening
+                    # code may have modified the file in the same second
+                    # without changing its size. So force caller to check
+                    # file contents.
+                    ladd(fn)
+                    #self.normallookup(fn)
                 elif listclean:
                     cadd(fn)
             elif state == 'm':
"""

That commented-out call to self.normallookup() is there because I'm
torn about whether to include it.  Specifically: nothing else in
status() updates self._map, so doing it in this case is inconsistent.
But if we *don't* update self._map, then we'll write a dirstate saying
this file is normal, when we've just established that it needs lookup.
So if we have out-of-process activity that updates the file without
changing its size and calls "hg status", all within the same second,
it could incorrectly think that the file is clean.  But AFAICT nothing
in repo.status() or commands.status() actually writes the dirstate, so
worrying about what gets written to the dirstate is pointless.

Aiieee!  I hope you're less confused than I am.  What am I missing?

(For the record, my new test-commit-multiple.t script passes with and
without the call to self.normallookup(), with 20-30 runs after each
change to the code.)

        Greg
-- 
Greg Ward                                http://www.gerg.ca/
Question authority!


More information about the Mercurial-devel mailing list