[PATCH 1 of 2 STABLE] hooks: always include HG_PENDING

Fri Nov 6 11:36:11 CST 2015

On 11/04/2015 08:11 PM, Durham Goode wrote:
>
>
> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
>> At Tue, 3 Nov 2015 17:19:37 -0800,
>> Durham Goode wrote:
>>> # HG changeset patch
>>> # User Durham Goode <durham at fb.com>
>>> # Date 1446598693 28800
>>> #      Tue Nov 03 16:58:13 2015 -0800
>>> # Branch stable
>>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
>>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
>>> hooks: always include HG_PENDING
>>>
>>> Previously we would only include HG_PENDING in the hook args if the
>>> transaction's writepending() actually wrote something. This is a bad
>>> criteria,
>>> since it's possible that a previous call to writepending() wrote
>>> stuff and the
>>> hooks want to still see that.
>>>
>>> The solution is to always have hooks execute within the scope of the
>>> pending
>>> changes by always putting HG_PENDING in the environment.
>>>
>>> diff --git a/mercurial/hook.py b/mercurial/hook.py
>>> --- a/mercurial/hook.py
>>> +++ b/mercurial/hook.py
>>> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
>>>       # make in-memory changes visible to external process
>>>       tr = repo.currenttransaction()
>>>       repo.dirstate.write(tr)
>>> -    if tr and tr.writepending():
>>> +    if tr:
>>> +        tr.writepending()
>>>           env['HG_PENDING'] = repo.root
>>>       for k, v in args.iteritems():
>> 'transaction.writepending()' saves whether there is any pending data
>> or not into 'self._anypending'.
>>
>>
>> https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350
>>
>>
>>      @active
>>      def writepending(self):
>>          '''write pending file to temporary version
>>
>>          This is used to allow hooks to view a transaction before
>> commit'''
>>          categories = sorted(self._pendingcallback)
>>          for cat in categories:
>>              # remove callback since the data will have been flushed
>>              any = self._pendingcallback.pop(cat)(self)
>>              self._anypending = self._anypending or any
>>          self._anypending |= self._generatefiles(suffix='.pending')
>>          return self._anypending
>>
>> It should return True as expected, if pending data has been written out
>> once.
>>
>> The root cause of this issue seems that 'transaction.writepending()'
>> removes pending callback invoked once, even if pending callback didn't
>> actually write changes out yet.
>>
>> This removal of callbacks is reasonable, if it is assumed that pending
>> callback may not be "idempotent" (e.g. writing into the file opened
>> with "append" mode).
>>
>> But on the other hand, this removal may cause similar issue in other
>> code paths = other hook combinations (or introducing new hooks).
>>
>> To avoid such similar issues:
>>
>>    - invoke 'changelog.delayupdate()' explicitly before
>>      'transaction.writepending()' for external hook invocation, like
>>      'dirstate.write(tr)' above
>>
>>      e.g. in _exthook()@hook.py
>>
>>          tr = repo.currenttransaction()
>>          repo.dirstate.write(tr)
>>          if tr:
>>              repo.changelog.delayupdate(tr)
>>              if tr.writepending()
>>                  env['HG_PENDING'] = repo.root
>>
>>    - invoke 'changelog.delayupdate()' internally in changelog by
>>      overriding functions below, which open files in 'w'/'a' mode by
>>      'self.opener'
>>
>>      - revlog.addgroup
>>      - revlog.addrevision
>>      - revlog.checkinlinesize
>>
>>      e.g.:
>>
>>      def addrevision(self, text, transaction, link, p1, p2,
>> cachedelta=None,
>>                      node=None)
>>          self.delayupdate(transaction)
>>          return super(changelog, self).addrevision(.....)
>>
>> The former still needs 'changelog.delayupdate()' before starting
>> changes to setup opener (and other internal status) of changelog, but
>> the latter doesn't.
>>
>> I think the latter is reasonable. (yes, the former is just for
>> comparison :-)) How about these ?
> I think your second proposal sounds reasonable.  And will catch any
> cases that are currently missing it.

The 'delayupdate' is setting up all the delayed write logic for 
changelog and is intended to be called before touching the changelog.

That delay logic is supposed to be fully re-entrant:

1) delay update is requested → an in-memory buffer is setup
2) data is added to the changelog → goes to the in memory buffer
3) someone call cl._writepending → in memory buffer is flushed to .a file
4) further write to the changelog → goes to the .a file directly
5) at transaction commit time tr._finalized is called → '.a' become the 
real file.
(if (3) never happen, the in memory buffer is flushed when the 
transaction is closed)

Issue 4934 seems to be a combination of:
a) prechangegroup is called after delayupdate and before any write to 
the changelog
b) _writepending try to be smart and does not setup a .a file nor 
mention pending change if the in memory buffer is empty.
c) the _writepending is called only once by the transaction.

There is no real reason for call the prechangegroup hook after the delay 
update call. So we should just probably invert them. that would fix the 
issue on stable while keeping the rest of the logic unaltered.

I'll send a series doing the above.

Calling "delayupdate" again at the end of addchangegroup feels 
unnatural, it is really meant to be called before the update. If we want 
to could have a second "recordchange" function called after we added 
data, such function would be in charge of setting up the transaction 
level hoops. However, proper usage of delayupdate make it not so necessary.

Avoiding doing anything in _writepending seems sensible creating the .a 
means copying the whole changelog.i to the .a file and is expensive. 
Avoiding it when the add is a no-op is a win. However, this is probably 
enough of a special case that we could ignore it?

Some more interesting points

A) One could argue that we probably never wants direct update to the 
changelog and should ensure that no write happen without that update 
delay framework in place.

B) There is two ways of seeing the pending mechanism working:

1) One time callback
2) persistent callback

Related to these two approaches, we want to avoid regenerating 
content/setting up the same logic repeatedly. Because it is expensive.

Doing it with one time callback(1) requires the callback to setup 
diversion (what changelog do) or the callback to be re-registered 
anytime something changes (phases/bookmark approach).

Doing it with persistent callback(2) requires callback to be smart about 
not doing the same work multiple time.

Mercurial is currently more invested into (1) one time callback. But not 
entirely consistent about it. EG: filegenerator are currently persistent 
and called again for each single hook invocation. We should probably be 
smarter there.

-- 
Pierre-Yves David