[PATCH STABLE] localrepo: discard objects in _filecache at transaction failure (issue4876)

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu Oct 22 18:10:08 CDT 2015


At Thu, 22 Oct 2015 22:27:23 +0100,
Yuya Nishihara wrote:
> 
> On Thu, 22 Oct 2015 06:40:17 -0400, Augie Fackler wrote:
> > On Thu, Oct 22, 2015 at 05:20:17AM +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > > # Date 1445458662 -32400
> > > #      Thu Oct 22 05:17:42 2015 +0900
> > > # Branch stable
> > > # Node ID 865862bdaa59319b0304864ca20066a81529830d
> > > # Parent  b57e5bfaad7ccc9d45884717115ba9a36d6b7b41
> > > localrepo: discard objects in _filecache at transaction failure (issue4876)
> > >
> > > 'repo.invalidate()' deletes 'filecache'-ed properties by
> > > 'filecache.__delete__()' below via 'delattr(unfiltered, k)'. But
> > > cached objects are still kept in 'repo._filecache'.
> > >
> > >     def __delete__(self, obj):
> > >         try:
> > >             del obj.__dict__[self.name]
> > >         except KeyError:
> > >             raise AttributeError(self.name)
> 
> Yes, it means "next time, check if files have been modified".
> 
> > > If 'repo' object is reused even after failure of command execution,
> > > referring 'filecache'-ed property may reuse one kept in
> > > 'repo._filecache', even if reloading from a file is expected.
> > >
> > > Executing command sequence on command server is a typical case of this
> > > situation (a710936c3037 also tried to fix this issue).
> 
> If I understand it, a710936c3037 does the inverse. ce.refresh() updates only
> cachestat, which means "next time, you don't need to reload that file because
> the in-memory data should be identical".

Oops, not a710936c3037 but 5c0f5db65c6b (of you !), sorry.

Can it be fixed in-flight ? or should I resend revised one (if changes
themselves are reasonable) ?


> > >   1. start a command execution
> > >
> > >   2. 'changelog.delayupdate()' is invoked in a transaction scope
> > >
> > >      This replaces own 'opener' by '_divertopener()' for additional
> > >      accessing to '00changelog.i.a'
> > >
> > >   3. transaction is aborted, and command (1) execution is ended
> > >
> > >      After 'repo.invalidate()' at releasing store lock, changelog
> > >      object above (= 'opener' of it is still replaced) is deleted from
> > >      'repo.__dict__', but still kept in 'repo._filecache'
> > >
> > >   4. start next command execution with same 'repo'
> > >
> > >   5. referring 'repo.changelog' may reuse changelog object kept in
> > >      'repo._filecache' according to timestamp
> > >
> > >      Then, "No such file or directory" error occurs for
> > >      '00changelog.i.a', which is already removed at (3)
> > >
> > > This patch discards objects in '_filecache' at transaction failure.
> > >
> > > 'repo.invalidate()' at failure of "hg qpush" is removed, because now
> > > it is redundant.
> > >
> > > This patch doesn't make 'repo.invalidate()' always discard objects in
> > > '_filecache', because 'repo.invalidate()' is invoked also at unlocking
> > > store lock.
> > >
> > >   - "always discard objects in filecache at unlocking" may cause
> > >     serious performance problem for subsequent procedures at normal
> > >     execution
> > >
> > >   - but it is impossible to "discard objects in filecache at unlocking
> > >     only at failure", because 'releasefn' of lock can't know whether a
> > >     lock scope is terminated normally or not
> > >
> > >     BTW, using "with" statement described in PEP343 for lock may
> > >     resolve this ?
> > >
> > > IMHO, fundamental cause of this issue seems that truncation of
> > > '00changelog.i' occurs at failure of transaction, even though newly
> > > added revisions exist only in '00changelog.i.a' (aka "pending
> > > file"). This truncation implies useless updating 'st_mtime' of
> > > '00changelog.i', even though size of '00changelog.i' isn't changed by
> > > truncation itself.
> > >
> > > In order to avoid this redundant truncation in the future,
> > > 'revlog._writeentry()' and so on may have to be changed around
> > > 'transaction.add()' invocations.
> > >
> > > diff --git a/hgext/mq.py b/hgext/mq.py
> > > --- a/hgext/mq.py
> > > +++ b/hgext/mq.py
> > > @@ -847,7 +847,6 @@
> > >                  try:
> > >                      tr.abort()
> > >                  finally:
> > > -                    repo.invalidate()
> > >                      self.invalidate()
> > >                  raise
> > >          finally:
> > > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > > --- a/mercurial/localrepo.py
> > > +++ b/mercurial/localrepo.py
> > > @@ -1014,6 +1014,8 @@
> > >                  # out) in this transaction
> > >                  repo.vfs.rename('journal.dirstate', 'dirstate')
> > >
> > > +                repo.invalidate(clearfilecache=True)
> > > +
> > >          tr = transaction.transaction(rp, self.svfs, vfsmap,
> > >                                       "journal",
> > >                                       "undo",
> > > @@ -1205,13 +1207,15 @@
> > >                      pass
> > >              delattr(self.unfiltered(), 'dirstate')
> > >
> > > -    def invalidate(self):
> > > +    def invalidate(self, clearfilecache=False):
> > >          unfiltered = self.unfiltered() # all file caches are stored unfiltered
> > > -        for k in self._filecache:
> > > +        for k in self._filecache.keys():
> > >              # dirstate is invalidated separately in invalidatedirstate()
> > >              if k == 'dirstate':
> > >                  continue
> > >
> > > +            if clearfilecache:
> > > +                del self._filecache[k]
> > 
> > I can't comment on the correctness of the patch (it looks
> > superficially reasonable to me), but could we do self._filecache = {}
> > instead of del-ing each filecache entry individually?
> 
> Perhaps we could, but we'll need a trick to keep _filecache['dirstate'] anyway.

Yes, it is reason that we can't simply use "self._filecache = {}".

"Specific code path" like below seems not useful, because dirstate is
cached in many many cases.

    if 'dirstate' not in self._filecache:
        self._filecache = {}
    else:
        # existing code path

Are there any other ideas ?

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list