[PATCH 1 of 2 STABLE] hooks: always include HG_PENDING

Fri Nov 6 10:16:49 CST 2015

On 11/05/2015 02:18 PM, FUJIWARA Katsunori wrote:
> At Wed, 4 Nov 2015 17:11:12 -0800,
> Durham Goode wrote:
>>
>> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
>>> At Tue, 3 Nov 2015 17:19:37 -0800,
>>> Durham Goode wrote:
>>>> # HG changeset patch
>>>> # User Durham Goode <durham at fb.com>
>>>> # Date 1446598693 28800
>>>> #      Tue Nov 03 16:58:13 2015 -0800
>>>> # Branch stable
>>>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
>>>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
>>>> hooks: always include HG_PENDING
>>>>
>>>> Previously we would only include HG_PENDING in the hook args if the
>>>> transaction's writepending() actually wrote something. This is a bad criteria,
>>>> since it's possible that a previous call to writepending() wrote stuff and the
>>>> hooks want to still see that.
>>>>
>>>> The solution is to always have hooks execute within the scope of the pending
>>>> changes by always putting HG_PENDING in the environment.
>>>>
>>>> diff --git a/mercurial/hook.py b/mercurial/hook.py
>>>> --- a/mercurial/hook.py
>>>> +++ b/mercurial/hook.py
>>>> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
>>>>        # make in-memory changes visible to external process
>>>>        tr = repo.currenttransaction()
>>>>        repo.dirstate.write(tr)
>>>> -    if tr and tr.writepending():
>>>> +    if tr:
>>>> +        tr.writepending()
>>>>            env['HG_PENDING'] = repo.root
>>>>
>>>>        for k, v in args.iteritems():
>>> 'transaction.writepending()' saves whether there is any pending data
>>> or not into 'self._anypending'.
>>>
>>>       https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350
>>>
>>>       @active
>>>       def writepending(self):
>>>           '''write pending file to temporary version
>>>
>>>           This is used to allow hooks to view a transaction before commit'''
>>>           categories = sorted(self._pendingcallback)
>>>           for cat in categories:
>>>               # remove callback since the data will have been flushed
>>>               any = self._pendingcallback.pop(cat)(self)
>>>               self._anypending = self._anypending or any
>>>           self._anypending |= self._generatefiles(suffix='.pending')
>>>           return self._anypending
>>>
>>> It should return True as expected, if pending data has been written out
>>> once.
>>>
>>> The root cause of this issue seems that 'transaction.writepending()'
>>> removes pending callback invoked once, even if pending callback didn't
>>> actually write changes out yet.
>>>
>>> This removal of callbacks is reasonable, if it is assumed that pending
>>> callback may not be "idempotent" (e.g. writing into the file opened
>>> with "append" mode).
>>>
>>> But on the other hand, this removal may cause similar issue in other
>>> code paths = other hook combinations (or introducing new hooks).
>>>
>>> To avoid such similar issues:
>>>
>>>     - invoke 'changelog.delayupdate()' explicitly before
>>>       'transaction.writepending()' for external hook invocation, like
>>>       'dirstate.write(tr)' above
>>>
>>>       e.g. in _exthook()@hook.py
>>>
>>>           tr = repo.currenttransaction()
>>>           repo.dirstate.write(tr)
>>>           if tr:
>>>               repo.changelog.delayupdate(tr)
>>>               if tr.writepending()
>>>                   env['HG_PENDING'] = repo.root
>>>
>>>     - invoke 'changelog.delayupdate()' internally in changelog by
>>>       overriding functions below, which open files in 'w'/'a' mode by
>>>       'self.opener'
>>>
>>>       - revlog.addgroup
>>>       - revlog.addrevision
>>>       - revlog.checkinlinesize
>>>
>>>       e.g.:
>>>
>>>       def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
>>>                       node=None)
>>>           self.delayupdate(transaction)
>>>           return super(changelog, self).addrevision(.....)
>>>
>>> The former still needs 'changelog.delayupdate()' before starting
>>> changes to setup opener (and other internal status) of changelog, but
>>> the latter doesn't.
>>>
>>> I think the latter is reasonable. (yes, the former is just for
>>> comparison :-)) How about these ?
>> I think your second proposal sounds reasonable.  And will catch any
>> cases that are currently missing it.
>>>
>>>
>>> BTW, #1 of this series causes that HG_PENDING is always passed to
>>> external hooks while transaction running, even if there is no pending
>>> data. Is this reasonable ? I think that this should be backed out to
>>> avoid unintentional (trial) accessing to pending files.
>
>> Wouldn't we always want hooks to be able to see the current state of the
>> repo? This was originally added because there was a test failure if I
>> didn't add it.  I must've changed my implementation of #2 because now
>> there is no test failure if I exclude #1.  So I'm ok backing out #1 if
>> you think we should.
>
> At first, as far as I confirmed, #2 can fix issue4934 (= test newly
> added in #2 can run successfully) even without #1.
>
> The root cause of issue4934 is that 'transaction.writepending()'
> doesn't imply 'changelog._writepending()', which writes buffered
> changes out, at the second hook invocation. Lack of HG_PENDING in this
> case is just a side effect of it.
>
> Therefore, it should be specification matter whether HG_PENDING can be
> defined always, IMHO.
>
> Pierre-Yves, does your describing "HG_PENDING must -only- occured in
> the context of a transaction" mean that HG_PENDING can be defined even
> if there is no pending data yet while transaction running ? (I think
> it doesn't)
>
>      https://www.mercurial-scm.org/wiki/DirstateTransactionPlan#Detail_about_HG_PENDING_mechanism

I confirm that #1 seems wrong to me:

- writepending will is already properly reporting "True" for all call 
once something happened

- asking hooks to read the "pending" data when non-exist is superfluous 
and a change in behavior.

I've pushed a backout of #1 to the clowncopter.

I'm currently looking at #2 and related discussion.

-- 
Pierre-Yves David