RFC: exact change detection for non append-only files

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Wed Nov 18 07:02:45 CST 2015


At Tue, 17 Nov 2015 18:49:38 -0600,
Matt Mackall wrote:
> 
> On Wed, 2015-11-18 at 02:10 +0900, FUJIWARA Katsunori wrote:
> > Now, 'filecache' detects changes of files by 'cachestat.__eq__()' of
> > posix.py on POSIX platform, and it examines:
> > 
> >   - st_size:
> > 
> >     This works for append-only files (like revlog) as expect in all
> >     cases (doesn't it ?)
> > 
> >     But some status files (e.g. dirstate, bookmarks and so on) may
> > not
> >     be changed in size, even if they are actually changed.
> > 
> >   - st_mtime:
> > 
> >     For non append-only files, this works as expect in many cases.
> > But
> >     'st_mtime' doesn't have enough resolution for recent computing
> > and
> >     I/O speed, even if it is represented in float (see also issue4836
> >     for more detail).
> > 
> >   - st_ino:
> > 
> >     This can compensate for 'st_mtime', because copy-on-write
> >     semantics always changes st_ino.
> > 
> > Therefore, 'st_ino' is the last bastion for change detection of
> > dirstate and so on.
> > 
> > But inode is quickly reused on some filesystems (perhaps for
> > performance reason), and it prevents examination of 'st_ino' from
> > detecting changes as expected.
> > 
> > My instant ideas to detect changes correctly even in such situation
> > are:
> > 
> >   - ignore this very very rare case :-)
> 
> Has it ever been observed?
> 
> We'd probably need the following sequence of events to trigger it:
> 
> - create file with stats (S1, M1, I1)
> - replace with file (S1, M1, I2)
> - replace again with file (S1, M1, I1) in the same second
> 
> We also have st_ctime in the tuple.

I found this issue while checking for issue4368, and this assumes
parallel processing.

    https://bz.mercurial-scm.org/show_bug.cgi?id=4368

At parallel commits, invalidation of 'repo.dirstate' at acquisition of
wlock fails to discard already loaded dirstate data very very rarely.

Then, new head is created in such case, because dirstate parents
before recent commit is reused as parents of new commit.

I guess typical reproduction steps as below:

  0. it is assumed that st_ino of dirstate is INO1
  1. process P1 acquires wlock
  2. process P2 accesses to dirstate without wlock (*1)
  3. P2 waits for wlock to be released by P1
  4. P1 changes dirstate by commit (=> st_ino dirstate is INO2)
  5. P1 releases wlock
  ---- (*2) ----
  6. P2 acquires wlock
  7. P2 invalidates 'repo.dirstate'
  8. P2 examines stat of '.hg/dirstate' at next access to 'repo.dirstate'
     - '.hg/dirstate' is reloaded as expected, if one of below is changed:
       - st_size
       - st_mtime (or st_ctime)
       - st_ino
     - 'repo.dirstate' is reused, otherwise
       for example, if
       - another process P3 acquires wlock and update dirstate at (*2),
       - little time (under st_mtime resolution) has passed from (*1) to (*2),
       - and INO1 is reused for st_ino of updated dirstate

Important point of this issue seems combination of reading dirstate
outside wlock (*1) and subsequent acquisition of wlock at (6).

Surrounding 'commands.commit()' with wlock/(s)lock scope can avoid
this kind of issue between "hg commit" processes, because dirstate is
always accessed only inside wlock scope.

But there are some code paths accessing dirstate before acquisition of
wlock (e.g. 'bailifchanged()', 'repo.stats()' and so on before actual
changing), and this may cause similar issue, if hg commands are
parallelly executed.

BTW, does "working copy sync", which was discussed in London sprint,
imply background commit ? It may cause similar issue of foreground hg
command, if so.


> > Or some other reasonable ones ?
> 
> Crazy idea: whenever we commit an atomictempfile with the same time as 
> its replacement, we could first bump its mtime 1 second into the
> future.
> 
> -- 
> Mathematics is the supreme nostalgia of our time.
> 
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list