[PATCH] dirstate: avoid a race with multiple commits in the same process

Matt Mackall mpm at selenic.com
Fri Mar 18 23:12:22 CDT 2011


On Fri, 2011-03-18 at 22:30 -0400, Greg Ward wrote:
> On 18 March 2011, Matt Mackall said:
> > On Thu, 2011-03-17 at 21:37 -0400, Greg Ward wrote:
> > >  
> > > +            if lastnormal(fn):
> > > +                # If previously in this process we decided that this
> > > +                # file is clean (state 'normal'), think twice:
> > > +                # intervening code may have modified the file in the
> > > +                # same second without changing its size. So force caller
> > > +                # to check file contents.
> > > +                self.normallookup(fn)
> > >              state, mode, size, time = dmap[fn]
> > 
> > This piece is suboptimal. We'll end up doing an expensive lookup on
> > every file from the last checkin regardless of whether it visibly
> > changed.
> > 
> > I think this wants to be right before 'elif clean' about 10 lines down.
> 
> Ahhhh, I see.  OK, here's what I've got right now; this is just the
> fragment of patch in dirstate.status() (the rest is the same):
> 
> """
> @@ -640,6 +648,7 @@
>          radd = removed.append
>          dadd = deleted.append
>          cadd = clean.append
> +        lastnormal = self._lastnormal.__contains__
>  
>          lnkkind = stat.S_IFLNK
>  
> @@ -672,6 +681,14 @@
>                  elif (time != int(st.st_mtime)
>                        and (mode & lnkkind != lnkkind or self._checklink)):
>                      ladd(fn)
> +                elif lastnormal(fn):
> +                    # If previously in this process we recorded that this
> +                    # file is 'normal' (clean), think twice: intervening
> +                    # code may have modified the file in the same second
> +                    # without changing its size. So force caller to check
> +                    # file contents.
> +                    ladd(fn)
> +                    #self.normallookup(fn)
>                  elif listclean:
>                      cadd(fn)
>              elif state == 'm':
> """
> 
> That commented-out call to self.normallookup() is there because I'm
> torn about whether to include it.  Specifically: nothing else in
> status() updates self._map, so doing it in this case is inconsistent.
> But if we *don't* update self._map, then we'll write a dirstate saying
> this file is normal, when we've just established that it needs lookup.
> So if we have out-of-process activity that updates the file without
> changing its size and calls "hg status", all within the same second,
> it could incorrectly think that the file is clean.  But AFAICT nothing
> in repo.status() or commands.status() actually writes the dirstate, so
> worrying about what gets written to the dirstate is pointless.
> 
> Aiieee!  I hope you're less confused than I am.  What am I missing?

Yeah, that's getting pretty complicated.

I think the secret might be that we actually do try to update the
dirstate of the lookup list in localrepo.py:status. The other piece is
that the act of committing in transplant is actually making things
clean. So long as no one modifies the file between the time the commit
happens and the dirstate is written, we do ok.

I'm not entirely sure this covers all the bases, but I'm not
super-concerned with dealing with races against other processes. 
Those races can't be entirely closed.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list