[PATCH STABLE] localrepo: discard objects in _filecache at transaction failure (issue4876)
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Thu Oct 22 18:10:08 CDT 2015
At Thu, 22 Oct 2015 22:27:23 +0100,
Yuya Nishihara wrote:
>
> On Thu, 22 Oct 2015 06:40:17 -0400, Augie Fackler wrote:
> > On Thu, Oct 22, 2015 at 05:20:17AM +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > > # Date 1445458662 -32400
> > > # Thu Oct 22 05:17:42 2015 +0900
> > > # Branch stable
> > > # Node ID 865862bdaa59319b0304864ca20066a81529830d
> > > # Parent b57e5bfaad7ccc9d45884717115ba9a36d6b7b41
> > > localrepo: discard objects in _filecache at transaction failure (issue4876)
> > >
> > > 'repo.invalidate()' deletes 'filecache'-ed properties by
> > > 'filecache.__delete__()' below via 'delattr(unfiltered, k)'. But
> > > cached objects are still kept in 'repo._filecache'.
> > >
> > > def __delete__(self, obj):
> > > try:
> > > del obj.__dict__[self.name]
> > > except KeyError:
> > > raise AttributeError(self.name)
>
> Yes, it means "next time, check if files have been modified".
>
> > > If 'repo' object is reused even after failure of command execution,
> > > referring 'filecache'-ed property may reuse one kept in
> > > 'repo._filecache', even if reloading from a file is expected.
> > >
> > > Executing command sequence on command server is a typical case of this
> > > situation (a710936c3037 also tried to fix this issue).
>
> If I understand it, a710936c3037 does the inverse. ce.refresh() updates only
> cachestat, which means "next time, you don't need to reload that file because
> the in-memory data should be identical".
Oops, not a710936c3037 but 5c0f5db65c6b (of you !), sorry.
Can it be fixed in-flight ? or should I resend revised one (if changes
themselves are reasonable) ?
> > > 1. start a command execution
> > >
> > > 2. 'changelog.delayupdate()' is invoked in a transaction scope
> > >
> > > This replaces own 'opener' by '_divertopener()' for additional
> > > accessing to '00changelog.i.a'
> > >
> > > 3. transaction is aborted, and command (1) execution is ended
> > >
> > > After 'repo.invalidate()' at releasing store lock, changelog
> > > object above (= 'opener' of it is still replaced) is deleted from
> > > 'repo.__dict__', but still kept in 'repo._filecache'
> > >
> > > 4. start next command execution with same 'repo'
> > >
> > > 5. referring 'repo.changelog' may reuse changelog object kept in
> > > 'repo._filecache' according to timestamp
> > >
> > > Then, "No such file or directory" error occurs for
> > > '00changelog.i.a', which is already removed at (3)
> > >
> > > This patch discards objects in '_filecache' at transaction failure.
> > >
> > > 'repo.invalidate()' at failure of "hg qpush" is removed, because now
> > > it is redundant.
> > >
> > > This patch doesn't make 'repo.invalidate()' always discard objects in
> > > '_filecache', because 'repo.invalidate()' is invoked also at unlocking
> > > store lock.
> > >
> > > - "always discard objects in filecache at unlocking" may cause
> > > serious performance problem for subsequent procedures at normal
> > > execution
> > >
> > > - but it is impossible to "discard objects in filecache at unlocking
> > > only at failure", because 'releasefn' of lock can't know whether a
> > > lock scope is terminated normally or not
> > >
> > > BTW, using "with" statement described in PEP343 for lock may
> > > resolve this ?
> > >
> > > IMHO, fundamental cause of this issue seems that truncation of
> > > '00changelog.i' occurs at failure of transaction, even though newly
> > > added revisions exist only in '00changelog.i.a' (aka "pending
> > > file"). This truncation implies useless updating 'st_mtime' of
> > > '00changelog.i', even though size of '00changelog.i' isn't changed by
> > > truncation itself.
> > >
> > > In order to avoid this redundant truncation in the future,
> > > 'revlog._writeentry()' and so on may have to be changed around
> > > 'transaction.add()' invocations.
> > >
> > > diff --git a/hgext/mq.py b/hgext/mq.py
> > > --- a/hgext/mq.py
> > > +++ b/hgext/mq.py
> > > @@ -847,7 +847,6 @@
> > > try:
> > > tr.abort()
> > > finally:
> > > - repo.invalidate()
> > > self.invalidate()
> > > raise
> > > finally:
> > > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > > --- a/mercurial/localrepo.py
> > > +++ b/mercurial/localrepo.py
> > > @@ -1014,6 +1014,8 @@
> > > # out) in this transaction
> > > repo.vfs.rename('journal.dirstate', 'dirstate')
> > >
> > > + repo.invalidate(clearfilecache=True)
> > > +
> > > tr = transaction.transaction(rp, self.svfs, vfsmap,
> > > "journal",
> > > "undo",
> > > @@ -1205,13 +1207,15 @@
> > > pass
> > > delattr(self.unfiltered(), 'dirstate')
> > >
> > > - def invalidate(self):
> > > + def invalidate(self, clearfilecache=False):
> > > unfiltered = self.unfiltered() # all file caches are stored unfiltered
> > > - for k in self._filecache:
> > > + for k in self._filecache.keys():
> > > # dirstate is invalidated separately in invalidatedirstate()
> > > if k == 'dirstate':
> > > continue
> > >
> > > + if clearfilecache:
> > > + del self._filecache[k]
> >
> > I can't comment on the correctness of the patch (it looks
> > superficially reasonable to me), but could we do self._filecache = {}
> > instead of del-ing each filecache entry individually?
>
> Perhaps we could, but we'll need a trick to keep _filecache['dirstate'] anyway.
Yes, it is reason that we can't simply use "self._filecache = {}".
"Specific code path" like below seems not useful, because dirstate is
cached in many many cases.
if 'dirstate' not in self._filecache:
self._filecache = {}
else:
# existing code path
Are there any other ideas ?
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list