[PATCH] transaction: support for callbacks during abort

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Jul 14 06:26:14 CDT 2015



On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
>
>
> At Sat, 11 Jul 2015 20:47:00 +0100,
> Pierre-Yves David wrote:
>
>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
>>> Now, I'm working for dirstate consistency around inside and border of
>>> transaction, and planning to use this abort callback mechanism to know
>>> the end of transaction at failure.
>>
>> I'm a bit curious about what code you need to run in such case. If you
>> use the file generator mechanism you do not need to worries about
>> restoring backup.
>>
>> However, I'm not sure how rolling back the file would atomatically
>> invalide the in memory object content (probably not). Is this why you
>> need a callback? could we (would it make sense to) integrate that as
>> part of one of our current cache invalidation mechanism?
>
>
> I'm planning to use callback mechanism for purposes below:
>
> 1. to make existing 'dirstate.write()' invocations write in-memory
>     changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
>     'dirstate.write()' should know whether transaction is running or not
>
>     I'm planning changes below for this purpose:
>
>     - invoke (newly added) 'dirstate.begintransaction()' at the
>       beginning of transaction in 'localrepo.transaction()'
>
>     - invoke (newly added) 'dirstate.endtransaction()' at the end of
>       transaction via "the end of transaction" callback (instead of
>       finalize/abort callbacks)
>
>
>     There are some other ways:
>
>     1-1. instantiate dirstate with 'repo'
>          (or weak ref of repo to avoid circular reference ?)
>
>     1-2. make 'dirstate.write()' require 'localrepo' argument
>
>          This seems too large impact (especially for 3rd party
>          extensions, even though Mercurial doesn't ensure compatibility
>          of internal API at all :-))
>
>     In both cases, 'dirstate.write()' can determine appropriate output
>     file by examination of 'repo.currenttransaction()'.

I would prefer one of this options. Using hooks for such core code looks 
a bit hacky and fragile. I would be happy with wither 1-1 (weak-ref 
please) or 1-2. With a preference for 1-2.

You can make the repo argument optional, with a devel warning if it is 
missing for example. To avoid too much breakage for a couple of version.

> 2. in-memory dirstate changes should be discarded at failure of
>     transaction, to prevent them from being written out at (outer)
>     'wlock.release' or so
>
>     I'm planning to do this via "the end of transaction" callback.
>
>
>     Using 'dirstate.{begin|end}parentchange()' mechanism as below also
>     works similarly.
>
>     - invoke beginparentchange at starting transaction
>     - invoke endparentchange at (successfully) closing transaction
>     - wlock.release at failure invokes 'dirstate.invalidate()'
>
>     But it seems too optimistic for outside transaction :-), and it
>     can't ensure consistency just at the end of transaction.

I'm a bit confused about why you would invoe dirstate.invalidate() at 
lock release. Can't we have something invalidating the dirstate content 
when transaction if rolledback (so that dirstate is read from disk?

Do we have something forcing dirstate flushing at the transaction 
openning? (so that the backup we restore is okay?)

>>> But current implementation doesn't invoke abort callbacks, if there is
>>> no change on repository data at aborting transaction: e.g. "hg import"
>>> fails to import the first patch.
>>
>> The change to dirstate should most probably register himself as change
>> to the transaction. That would trigger it.
>
> I found two problems of 'addfilegenerator()' below. IMHO, the former
> is a major problem, and the latter is minor one (but you should
> dislike it as later :-)).

If addfilegenerator fails to fit our needs here, we should probably 
hammer it so it does.

> And, both of my fixing plans cause omitting invocation of abort
> callbacks, because they makes 'transaction._backupentries' empty :-<
>
> - current 'addfilegenerator()' causes forcibly restoring
>    '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
>    parents of the working directory
>
>    This causes unexpected "hg status" output at rollback-ing in such
>    situation (aka non-"parentgone" case).

Do you mean there is category of change to dirstate that must not be 
rolledback if transaction fails? I do not see data about this on the 
wikipage (but I could be blind) Can you elaborate?

I'm not sure to understand the issue here.

>    I'm planning to introduce 'backup' optional argument, which controls
>    'addbackup()' invocation in '_generatefiles()' to avoid issue above,
>    to 'addfilegenerator()'.
>
>      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267

That sound wrong to me. We either have change to be apply on transaction 
success and rolledback on transaction failure. But I can't see case were 
backup will be wrong.


>    According to discussion before, dirstate writes changes into (a)
>    '.hg/dirstate' at the beginning of transaction and (b)
>    '.hg/dirstate.pending' while transaction running. Then, discarding
>    the latter can work as a kind of restoring. So, omitting
>    'addbackup()' invocation doesn't cause problem at least for dirstate.
>
>    Or is another trick like "add to _backupentries, but omit backup
>    itself" better ?

The backup are also used for rollback (and hg recover). We must keep it.

>
> - current 'addfilegenerator()' causes forcibly writing data out at
>    'writepending()', even if data hasn't been changed since last
>    writing out
>
>    When external hooks are invoked while transaction, this causes
>    additional "transaction abort!" and "rollback completed" output,
>    even though dirstate hasn't been changed since the beginning of
>    transaction.
>
>    For example, pushing changes/bookmarks causes writing into
>    'dirstate.pending' at REMOTE side, even though it never uses and
>    changes dirstate.

If there is no change to the dirstate. Why are we recording any change 
into the transaction at all?

>    I'm planning to introduce 'checkdirty' optional argument, which is
>    used to examine whether data has been changed since last writing
>    out (= 'genfunc' should be invoked or not), to avoid redundant
>    writing data out.

Yes, this way of detecting dirtyness have been bothering me in the past. 
I've a small series to clean this us, let me undust it and patchbomb it.

>>> So, there is no easy and certain way to know the end of transaction at
>>> failure.
>>>
>>> BTW, this also causes that 'txnabort' hook isn't invoked at aborting
>>> transaction in such case, because 'txnabort' hook invocation is
>>> achieved by abort callback mechanism. This may need some cleanup for
>>> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
>>> will be invoked subsequently.
>>>
>>> Then, please let me confirm some points below before changing around
>>> 'transaction._abort()' for my purpose:
>>>
>>>     - are there any usecases to expect actual changes to be rollbacked ?
>>>
>>>       In other words, does it cause problem to invoke abort callbacks
>>>       regardless of actual changes ?
>>
>> If I remember correctly, it make "transaction abort" "abort complete
>> message to happears in a lot of new place. Basically any place where we
>> open a transaction but eventually do nothing. We can probably not change it
>
> Adding invocation of abort callbacks just before cleaning journal
> files up can avoid such additional messages, even though duplicated
> code paths are redundant :-)
>
>      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477
>
>      diff --git a/mercurial/transaction.py b/mercurial/transaction.py
>      --- a/mercurial/transaction.py
>      +++ b/mercurial/transaction.py
>      @@ -482,6 +482,8 @@ class transaction(object):
>
>               try:
>                   if not self.entries and not self._backupentries:
>      +                for cat in sorted(self._abortcallback):
>      +                    self._abortcallback[cat](self)
>                       if self.journal:
>                           self.opener.unlink(self.journal)
>                       if self._backupjournal:

I do not understand what we are trying to achieve here. Having a wider 
definition of abort (abort apply even if transaction is empty) seems a 
dubious semantic and will have huge impact in the UI.


>
>
>>>     - are there any usecases to refer changes to be rollbacked in abort
>>>       callbacks ?
>>>
>>>       In other words, does it cause problem to invoke abort callbacks
>>>       after rollbacking ?
>>
>> I would says yes, but detail could be complicated (to be continued)
>>
>>>
>>>       BTW, 'txnabort' hooks are invoked without 'pending', and it means
>>>       that such changes should be invisible to 'txnabort' hooks, doesn't
>>>       it ? > Pierre-Yves
>>
>> And here we have proof of (1) people do not rely of it too much (2)
>> definition of the state of the transaction during abort is far from trivial.
>> We should probably keep it with rolled back content/
>
> It means that 'txnabort' hooks should be invoked with 'pending' (=
> rolled back content should be visible to them), doen't it ? (sorry for
> my poor English, if I misunderstood :-<)

No. It means that:

1) Nobody complains about the lack of pending so far (since we do not 
have it).

2) I'm not sure we can have anything sensible done here (reading pending 
with a transaction half applied, will be bad)

Conclusion: we should keep not having pending there.

>>>       Then, IMHO, such changes should be invisible to abort callbacks,
>>>       too.
>>>
>>> Maybe, should I introduce another callback to know the end of
>>> transaction (regardless of the result of it, or only at failure) ?
>>
>> Sounds like an awesome idea. Python have:
>>
>> try:
>>       .... # equivalent to transaction open
>> except:
>>       .... # equivalent to transaction abort
>> else:
>>       .... # equivalent to transaction success
>> finally:
>>       .... # equivalent of your proposal
>>
>> Finally is definitely very useful, so go for it.
>
> I'll do so.

Cool, (but It is not clear it is on the critical path, so no rush).

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list