[PATCH 1 of 2 STABLE] hooks: always include HG_PENDING

Thu Nov 5 13:18:58 CST 2015

At Wed, 4 Nov 2015 17:11:12 -0800,
Durham Goode wrote:
> 
> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
> > At Tue, 3 Nov 2015 17:19:37 -0800,
> > Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham at fb.com>
> >> # Date 1446598693 28800
> >> #      Tue Nov 03 16:58:13 2015 -0800
> >> # Branch stable
> >> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
> >> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
> >> hooks: always include HG_PENDING
> >>
> >> Previously we would only include HG_PENDING in the hook args if the
> >> transaction's writepending() actually wrote something. This is a bad criteria,
> >> since it's possible that a previous call to writepending() wrote stuff and the
> >> hooks want to still see that.
> >>
> >> The solution is to always have hooks execute within the scope of the pending
> >> changes by always putting HG_PENDING in the environment.
> >>
> >> diff --git a/mercurial/hook.py b/mercurial/hook.py
> >> --- a/mercurial/hook.py
> >> +++ b/mercurial/hook.py
> >> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
> >>       # make in-memory changes visible to external process
> >>       tr = repo.currenttransaction()
> >>       repo.dirstate.write(tr)
> >> -    if tr and tr.writepending():
> >> +    if tr:
> >> +        tr.writepending()
> >>           env['HG_PENDING'] = repo.root
> >>   
> >>       for k, v in args.iteritems():
> > 'transaction.writepending()' saves whether there is any pending data
> > or not into 'self._anypending'.
> >
> >      https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350
> >
> >      @active
> >      def writepending(self):
> >          '''write pending file to temporary version
> >
> >          This is used to allow hooks to view a transaction before commit'''
> >          categories = sorted(self._pendingcallback)
> >          for cat in categories:
> >              # remove callback since the data will have been flushed
> >              any = self._pendingcallback.pop(cat)(self)
> >              self._anypending = self._anypending or any
> >          self._anypending |= self._generatefiles(suffix='.pending')
> >          return self._anypending
> >
> > It should return True as expected, if pending data has been written out
> > once.
> >
> > The root cause of this issue seems that 'transaction.writepending()'
> > removes pending callback invoked once, even if pending callback didn't
> > actually write changes out yet.
> >
> > This removal of callbacks is reasonable, if it is assumed that pending
> > callback may not be "idempotent" (e.g. writing into the file opened
> > with "append" mode).
> >
> > But on the other hand, this removal may cause similar issue in other
> > code paths = other hook combinations (or introducing new hooks).
> >
> > To avoid such similar issues:
> >
> >    - invoke 'changelog.delayupdate()' explicitly before
> >      'transaction.writepending()' for external hook invocation, like
> >      'dirstate.write(tr)' above
> >
> >      e.g. in _exthook()@hook.py
> >
> >          tr = repo.currenttransaction()
> >          repo.dirstate.write(tr)
> >          if tr:
> >              repo.changelog.delayupdate(tr)
> >              if tr.writepending()
> >                  env['HG_PENDING'] = repo.root
> >
> >    - invoke 'changelog.delayupdate()' internally in changelog by
> >      overriding functions below, which open files in 'w'/'a' mode by
> >      'self.opener'
> >
> >      - revlog.addgroup
> >      - revlog.addrevision
> >      - revlog.checkinlinesize
> >
> >      e.g.:
> >
> >      def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
> >                      node=None)
> >          self.delayupdate(transaction)
> >          return super(changelog, self).addrevision(.....)
> >
> > The former still needs 'changelog.delayupdate()' before starting
> > changes to setup opener (and other internal status) of changelog, but
> > the latter doesn't.
> >
> > I think the latter is reasonable. (yes, the former is just for
> > comparison :-)) How about these ?
> I think your second proposal sounds reasonable.  And will catch any 
> cases that are currently missing it.
> >
> >
> > BTW, #1 of this series causes that HG_PENDING is always passed to
> > external hooks while transaction running, even if there is no pending
> > data. Is this reasonable ? I think that this should be backed out to
> > avoid unintentional (trial) accessing to pending files.

> Wouldn't we always want hooks to be able to see the current state of the 
> repo? This was originally added because there was a test failure if I 
> didn't add it.  I must've changed my implementation of #2 because now 
> there is no test failure if I exclude #1.  So I'm ok backing out #1 if 
> you think we should.
> 

At first, as far as I confirmed, #2 can fix issue4934 (= test newly
added in #2 can run successfully) even without #1.

The root cause of issue4934 is that 'transaction.writepending()'
doesn't imply 'changelog._writepending()', which writes buffered
changes out, at the second hook invocation. Lack of HG_PENDING in this
case is just a side effect of it.

Therefore, it should be specification matter whether HG_PENDING can be
defined always, IMHO.

Pierre-Yves, does your describing "HG_PENDING must -only- occured in
the context of a transaction" mean that HG_PENDING can be defined even
if there is no pending data yet while transaction running ? (I think
it doesn't)

    https://www.mercurial-scm.org/wiki/DirstateTransactionPlan#Detail_about_HG_PENDING_mechanism

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp