[PATCH 1 of 5] dirstate: introduce unsureifambig to detect change of a file correctly

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Fri May 29 10:46:23 CDT 2015


At Thu, 28 May 2015 15:57:21 +0000,
Martin von Zweigbergk wrote:
> 
> I have a few questions below. It seemed clear to Pierre-Yves, so perhaps
> I'm just being slow. Feel free to ignore if you think that's the case :-P
> 
> On Wed, May 27, 2015 at 10:04 AM FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> wrote:
> 
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1432745859 -32400
> > #      Thu May 28 01:57:39 2015 +0900
> > # Node ID ab9f120295b59933d1acd72771f01b5fac8d221d
> > # Parent  6ac860f700b5cfeda232d5305963047696b869ca
> > dirstate: introduce unsureifambig to detect change of a file correctly
> >
> > To detect change of a file without redundant comparison of file
> > content, dirstate recognizes a file as certainly clean, if:
> >
> >   (1) it is already known as "normal",
> >   (2) dirstate entry for it has valid (= not "-1") timestamp, and
> >   (3) mode, size and timestamp of it on the filesystem are as same as
> >       ones expected in dirstate
> >
> > This works as expected in many cases, but doesn't in the corner case
> > that changing a file keeps mode, size and timestamp of it on the
> > filesystem.
> >
> 
> Could you elaborate a bit on what the corner case is? Maybe written down as
> a sequence of steps. Is the following correct?
> 
> 1. Initially, the file's timestamp matches dirstate's entry for the file,
> so it's correctly considered clean
> 2. The file is modified (without changing size or mode) by mercurial itself
> (e.g. as part of rebase) and the dirstate entry is updated in memory
> 3. The dirstate is written to disk
> 4. The user runs 'hg status'
> 
> I think you are saying that if step 3 happens within the same second as
> step 2, the time will be updated to -1, which will force a lookup at step 4
> and the bug does not appear. So when does the bug appear? Am I missing a
> "3.5 The file is modified again"? By the user? By mercurial? Within the
> same process?

Typical case of this issue is that step (2) occurs at timestamp N (in
units of "int" seconds) but step (3) occurs at N + 1. In this case,
timestamp N for modified files are kept in dirstate.

Even if Mercurial (and underlying layers) becomes faster and faster
than now, this issue will appear, because step (2) may occur at
"almost N + 1 but not N", like N + 0.9999999999999.... :-)


> >
> > Occasional test failure for unexpected file status is typical example
> > of this corner case: batch execution with small working directory
> > causes that `parsers.pack_dirstate()` replaces timestamp of dirstate
> > entries with "-1" in many cases, and condition (2) above is rarely
> > satisfied.
> >
> 
> I think you're saying here why the bug does *not* usually happen in tests.
> Correct? It's very easy to read the second part (after the ':') as being an
> explanation for the corner case, while it's actually an explanation for why
> it's only occasional and not frequent, IIUC.

Yes, I explained why the bug does not usually happen in tests.

I'll revise patch description to explain about typical case above and
so on clearly.


> 
> >
> > This patch adds utility function `unsureifambig()`, which invokes
> > `normallookup()` if (A) the file is already known as "normal" and (B)
> > size of it is equal to expected one.
> >
> > Otherwise, `unsureifambig()` does nothing. Keeping expected mode and
> > size of the file in dirstate is still useful to detect change of the
> > file without comparison of file content, in subsequent processing.
> >
> > diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> > --- a/mercurial/dirstate.py
> > +++ b/mercurial/dirstate.py
> > @@ -438,6 +438,20 @@
> >          if f in self._copymap:
> >              del self._copymap[f]
> >
> > +    def unsureifambig(self, f, size):
> > +        '''Mark a file "possibly dirty", if file size isn't changed
> > +
> > +        A file ``f`` may be mis-recognized as clean, even if it is
> > +        already modified, when mode, ``size`` and timestamp of it on
> > +        the filesystem are as same as ones expected in dirstate.
> > +
> > +        To avoid such ambiguous situation, this function should be
> > +        called just after writing data into the working directory.
> > +        '''
> > +        st = self._map.get(f)
> > +        if st and st[0] == 'n' and st[2] == size:
> > +            self.normallookup(f)
> > +
> >      def otherparent(self, f):
> >          '''Mark as coming from the other parent, always dirty.'''
> >          if self._pl[1] == nullid:
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
> >

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list