[PATCH 1 of 2 STABLE] hooks: always include HG_PENDING

Fri Nov 6 12:00:12 CST 2015

On 11/6/15 9:36 AM, Pierre-Yves David wrote:
>
>
> On 11/04/2015 08:11 PM, Durham Goode wrote:
>>
>>
>> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
>>> At Tue, 3 Nov 2015 17:19:37 -0800,
>>> Durham Goode wrote:
>>>> # HG changeset patch
>>>> # User Durham Goode <durham at fb.com>
>>>> # Date 1446598693 28800
>>>> #      Tue Nov 03 16:58:13 2015 -0800
>>>> # Branch stable
>>>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
>>>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
>>>> hooks: always include HG_PENDING
>>>>
>>>> Previously we would only include HG_PENDING in the hook args if the
>>>> transaction's writepending() actually wrote something. This is a bad
>>>> criteria,
>>>> since it's possible that a previous call to writepending() wrote
>>>> stuff and the
>>>> hooks want to still see that.
>>>>
>>>> The solution is to always have hooks execute within the scope of the
>>>> pending
>>>> changes by always putting HG_PENDING in the environment.
>>>>
>>>> diff --git a/mercurial/hook.py b/mercurial/hook.py
>>>> --- a/mercurial/hook.py
>>>> +++ b/mercurial/hook.py
>>>> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
>>>>       # make in-memory changes visible to external process
>>>>       tr = repo.currenttransaction()
>>>>       repo.dirstate.write(tr)
>>>> -    if tr and tr.writepending():
>>>> +    if tr:
>>>> +        tr.writepending()
>>>>           env['HG_PENDING'] = repo.root
>>>>       for k, v in args.iteritems():
>>> 'transaction.writepending()' saves whether there is any pending data
>>> or not into 'self._anypending'.
>>>
>>>
>>> https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350 
>>>
>>>
>>>
>>>      @active
>>>      def writepending(self):
>>>          '''write pending file to temporary version
>>>
>>>          This is used to allow hooks to view a transaction before
>>> commit'''
>>>          categories = sorted(self._pendingcallback)
>>>          for cat in categories:
>>>              # remove callback since the data will have been flushed
>>>              any = self._pendingcallback.pop(cat)(self)
>>>              self._anypending = self._anypending or any
>>>          self._anypending |= self._generatefiles(suffix='.pending')
>>>          return self._anypending
>>>
>>> It should return True as expected, if pending data has been written out
>>> once.
>>>
>>> The root cause of this issue seems that 'transaction.writepending()'
>>> removes pending callback invoked once, even if pending callback didn't
>>> actually write changes out yet.
>>>
>>> This removal of callbacks is reasonable, if it is assumed that pending
>>> callback may not be "idempotent" (e.g. writing into the file opened
>>> with "append" mode).
>>>
>>> But on the other hand, this removal may cause similar issue in other
>>> code paths = other hook combinations (or introducing new hooks).
>>>
>>> To avoid such similar issues:
>>>
>>>    - invoke 'changelog.delayupdate()' explicitly before
>>>      'transaction.writepending()' for external hook invocation, like
>>>      'dirstate.write(tr)' above
>>>
>>>      e.g. in _exthook()@hook.py
>>>
>>>          tr = repo.currenttransaction()
>>>          repo.dirstate.write(tr)
>>>          if tr:
>>>              repo.changelog.delayupdate(tr)
>>>              if tr.writepending()
>>>                  env['HG_PENDING'] = repo.root
>>>
>>>    - invoke 'changelog.delayupdate()' internally in changelog by
>>>      overriding functions below, which open files in 'w'/'a' mode by
>>>      'self.opener'
>>>
>>>      - revlog.addgroup
>>>      - revlog.addrevision
>>>      - revlog.checkinlinesize
>>>
>>>      e.g.:
>>>
>>>      def addrevision(self, text, transaction, link, p1, p2,
>>> cachedelta=None,
>>>                      node=None)
>>>          self.delayupdate(transaction)
>>>          return super(changelog, self).addrevision(.....)
>>>
>>> The former still needs 'changelog.delayupdate()' before starting
>>> changes to setup opener (and other internal status) of changelog, but
>>> the latter doesn't.
>>>
>>> I think the latter is reasonable. (yes, the former is just for
>>> comparison :-)) How about these ?
>> I think your second proposal sounds reasonable.  And will catch any
>> cases that are currently missing it.
>
> The 'delayupdate' is setting up all the delayed write logic for 
> changelog and is intended to be called before touching the changelog.
>
> That delay logic is supposed to be fully re-entrant:
>
> 1) delay update is requested → an in-memory buffer is setup
> 2) data is added to the changelog → goes to the in memory buffer
> 3) someone call cl._writepending → in memory buffer is flushed to .a file
> 4) further write to the changelog → goes to the .a file directly
> 5) at transaction commit time tr._finalized is called → '.a' become 
> the real file.
> (if (3) never happen, the in memory buffer is flushed when the 
> transaction is closed)
>
> Issue 4934 seems to be a combination of:
> a) prechangegroup is called after delayupdate and before any write to 
> the changelog
> b) _writepending try to be smart and does not setup a .a file nor 
> mention pending change if the in memory buffer is empty.
> c) the _writepending is called only once by the transaction.
>
>
> There is no real reason for call the prechangegroup hook after the 
> delay update call. So we should just probably invert them. that would 
> fix the issue on stable while keeping the rest of the logic unaltered.
>
> I'll send a series doing the above.
>
> Calling "delayupdate" again at the end of addchangegroup feels 
> unnatural, it is really meant to be called before the update. If we 
> want to could have a second "recordchange" function called after we 
> added data, such function would be in charge of setting up the 
> transaction level hoops. However, proper usage of delayupdate make it 
> not so necessary.
The current "proper usage" is to make sure tr.writepending is not called 
anywhere between delayupdate and the changelog write.  That seems like a 
very weird and unintuitive dependency that other things (like 
extensions) might accidentally break.

I think it'd be more intuitive to have delayupdate() just prep the .a 
redirect (but don't subscribe to writepending), then have another 
function be called after each changelog write to add the writepending 
subscription.  That follows the more normal pattern of "prepare for 
writes" + "mark dirty".