RFC: exact change detection for non append-only files

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Wed Nov 18 08:41:19 CST 2015


At Tue, 17 Nov 2015 09:58:56 -0800,
Pierre-Yves David wrote:
> 
> 
> 
> On 11/17/2015 09:10 AM, FUJIWARA Katsunori wrote:
> >
> > Now, 'filecache' detects changes of files by 'cachestat.__eq__()' of
> > posix.py on POSIX platform, and it examines:
> >
> >    - st_size:
> >
> >      This works for append-only files (like revlog) as expect in all
> >      cases (doesn't it ?)
> >
> >      But some status files (e.g. dirstate, bookmarks and so on) may not
> >      be changed in size, even if they are actually changed.
> >
> >    - st_mtime:
> >
> >      For non append-only files, this works as expect in many cases. But
> >      'st_mtime' doesn't have enough resolution for recent computing and
> >      I/O speed, even if it is represented in float (see also issue4836
> >      for more detail).
> >
> >    - st_ino:
> >
> >      This can compensate for 'st_mtime', because copy-on-write
> >      semantics always changes st_ino.
> >
> > Therefore, 'st_ino' is the last bastion for change detection of
> > dirstate and so on.
> >
> > But inode is quickly reused on some filesystems (perhaps for
> > performance reason), and it prevents examination of 'st_ino' from
> > detecting changes as expected.
> >
> > My instant ideas to detect changes correctly even in such situation
> > are:
> >
> >    - ignore this very very rare case :-)
> >
> >      Because the inode, which is used previously for status file X,
> >      should be reused for X again, at occurrence of this issue.
> >
> >    - writer: save also hash of data at writing data out
> >      reader: check hash, if 'st_ino' can't detect changes
> >
> >      (e.g. '.hg/dirstate.hash' for '.hg/dirstate')
> >
> >      This requires reading whole data file in to calculate hash value,
> >      and it easily decrease performance.
> >
> >    - writer: incremental and write "generation id" at writing data out
> >      reader: check "generation id", if 'st_ino' can't detect changes
> >
> >      (e.g. '.hg/dirstate.genid' for '.hg/dirstate')
> 
> Writing two different file will be subject to race conditions. :-/

Those supplemental files mainly focus on reloading '.hg/dirstate' as
expected at the 1st access to 'repo.dirstate' after acquisition of
wlock (= 'repo.invalidatedirstate()'). After acquisition of wlock,
dirstate itself and supplemental file of it are protected from
changing by another process.

On the other hand, outside wlock scope, overlooking dirstate changes
shouldn't be fatal problem :-)


> (Nice find, I'll be thinking about a work around here)

I'm looking forward to better ones !


BTW, can "cache-on-the-side" in the thread below help resolving this
issue ?

    http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/85197


> -- 
> Pierre-Yves David
> 
----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list