[PATCH] transaction: support for callbacks during abort

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri Jul 17 18:48:24 CDT 2015

On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote:
> At Tue, 14 Jul 2015 13:26:14 +0200,
> Pierre-Yves David wrote:
>> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
>>> At Sat, 11 Jul 2015 20:47:00 +0100,
>>> Pierre-Yves David wrote:
>>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
>>>>> Now, I'm working for dirstate consistency around inside and border of
>>>>> transaction, and planning to use this abort callback mechanism to know
>>>>> the end of transaction at failure.
>>>> I'm a bit curious about what code you need to run in such case. If you
>>>> use the file generator mechanism you do not need to worries about
>>>> restoring backup.
>>>> However, I'm not sure how rolling back the file would atomatically
>>>> invalide the in memory object content (probably not). Is this why you
>>>> need a callback? could we (would it make sense to) integrate that as
>>>> part of one of our current cache invalidation mechanism?
>>> I'm planning to use callback mechanism for purposes below:
>>> 1. to make existing 'dirstate.write()' invocations write in-memory
>>>      changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
>>>      'dirstate.write()' should know whether transaction is running or not
>>>      I'm planning changes below for this purpose:
>>>      - invoke (newly added) 'dirstate.begintransaction()' at the
>>>        beginning of transaction in 'localrepo.transaction()'
>>>      - invoke (newly added) 'dirstate.endtransaction()' at the end of
>>>        transaction via "the end of transaction" callback (instead of
>>>        finalize/abort callbacks)
>>>      There are some other ways:
>>>      1-1. instantiate dirstate with 'repo'
>>>           (or weak ref of repo to avoid circular reference ?)
>>>      1-2. make 'dirstate.write()' require 'localrepo' argument
>>>           This seems too large impact (especially for 3rd party
>>>           extensions, even though Mercurial doesn't ensure compatibility
>>>           of internal API at all :-))
>>>      In both cases, 'dirstate.write()' can determine appropriate output
>>>      file by examination of 'repo.currenttransaction()'.
>> I would prefer one of this options. Using hooks for such core code looks
>> a bit hacky and fragile.
> How about something new like 'unlock' of 'lock' (or 'after' of
> transaction) ? Would you think this kind of 'call back' mechanism
> itself is also fragile for core code (if newly added) ?

I do not understand what you mean.

I'm worried about hook approach being "fragile" because it dirstate 
defines very important data for the understanding of the repo. Having it 
a the same place as other extensions and config hook will probably lead 
to some ordering hell.

>> I would be happy with wither 1-1 (weak-ref
>> please) or 1-2. With a preference for 1-2.
>> You can make the repo argument optional, with a devel warning if it is
>> missing for example. To avoid too much breakage for a couple of version.
> BTW, what is your strong reason to choose 1-2 ? to avoid using
> weak ref of repo ?

I do not have a strong opinion about 1-1 vs 1-2.

1-2 avoid keeping the cyclic reference in the object design. We use the 
pattern of "providing a repo" when needed in multiple place. It also 
help refactoring code to have the argument explicitly passed. because it 
ensure function calling write have access to all necessary argument 
(here repo).

I think Matt have a strong opinion agains weakref.

>>> 2. in-memory dirstate changes should be discarded at failure of
>>>      transaction, to prevent them from being written out at (outer)
>>>      'wlock.release' or so
>>>      I'm planning to do this via "the end of transaction" callback.
>>>      Using 'dirstate.{begin|end}parentchange()' mechanism as below also
>>>      works similarly.
>>>      - invoke beginparentchange at starting transaction
>>>      - invoke endparentchange at (successfully) closing transaction
>>>      - wlock.release at failure invokes 'dirstate.invalidate()'
>>>      But it seems too optimistic for outside transaction :-), and it
>>>      can't ensure consistency just at the end of transaction.
>> I'm a bit confused about why you would invoe dirstate.invalidate() at
>> lock release. Can't we have something invalidating the dirstate content
>> when transaction if rolledback (so that dirstate is read from disk?
> (at first, 'dirstate.{begin|end}parentchange()' example above is
> just a example to do similar thing in another way, but not what I
> want to do)
>> Do we have something forcing dirstate flushing at the transaction
>> openning? (so that the backup we restore is okay?)
> AFAIK, with current implementation:
> - in-memory dirstate changes aren't explicitly written out at opening
>    transaction
>    In almost all cases, in-memory dirstate changes seem to be
>    INDIRECTLY written out before transaction opening. But it isn't
>    explicitly ensured.
>    This causes:
>    1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at
>       opening transaction, and
>    2. restoring one from (incorrect) '.hg/undo.dirstate', which is
>       renamed from '.hg/journal.dirstate' above, at rollback
>    (Oops, I have overlooked that journal file is created not from
>    dirstate instance but from '.hg/dirstate' file at opening
>    transaction :-< we have to fix this at first)
>    For example, at "hg backout --commit", dirstate changes via
>    reverting aren't written out until 'ctx.markcommitted()'
>    after closing the transaction. And, "hg status" after "hg
>    rollback" shows unexpected status for added/removed files.
>    BTW, after my previous change fe03f522dda9, this issue can be
>    reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing
>    on some unsure files :-)
>      https://selenic.com/repo/hg/rev/fe03f522dda9

It seems like we should never start a transaction with a dirty dirstate. 
Is there any theoritical reason for not doing it?

> - in-memory dirstate changes aren't explicitly discarded at
>    aborting transaction
>    (for convenience, let's assume that '.hg/journal.dirstate' is
>    correct, at this point)
>    As you described, aborting transaction restores '.hg/dirstate' from
>    '.hg/journal.dirstate' saved at opening transaction.
>    But in-memory changes during transaction are still kept in
>    'dirstate' instance, and they may be written into already restored
>    '.hg/dirstate' at 'wlock.release()' or so.

This sounds wrong (but less keep moving)

>    BTW, commands almost fully enclosed by transaction treat "in-memory
>    changes during aborted transaction" in ways below:
>    - in-memory changes are discarded by 'dirstateguard' OUTSIDE
>      transaction scope
>      ('hg qpush', 'hg commit --amend', and 'hg import')
>      "discarding in-memory dirstate changes at aborting
>      transaction" will make dirstateguard in these cases
>      useless.
>    - in-memory changes should be discarded, but not
>      ('hg transplant')

I do not understand this sentence.

>      Aborting for other than conflict may cause unexpected dirstate
>      (e.g. referring rollbacked revision as parent). This should be
>      fixed.
>    - discarding in-memory changes causes problem
>      ('hg shelve' and 'hg unshelve')
>      Current shelve/unshelve implementation expects aborting
>      transaction NOT to discard in-memory dirstate changes intentionally
>      or unintentionally.
>      1. start transaction
>      2. do shelve/unshelve, and this makes temporary revisions
>      3. update working directory to the parent revision at (1)
>         (+ some additional actions)
>         - dirstate is different from one at (1), because
>           some changes are shelved/unshelved
>      4. abort transaction intentionally
>         - rollback revisions created at (2)
>         - restore '.hg/dirstate' from one at (1)
>      5. write dirstate at 'wlock.release'
>         - here, '.hg/dirstate' is equal to one at (3)
>      IMHO, "strip by aborting current transaction" at (4) above
>      should be replaced by "strip by rollbacking the transaction
>      after once closing it".
>      The latter doesn't strip the parent of the working directory at
>      stripping because of updating at (3): this is non-"parentgone"
>      case. Then, dirstate is kept as same as one at (3) above.

Is there anything else than shelve relying on this. Having shelve 
relying on a wrong/buggy behavior from core seems debatable.
It should not block us to fix core.

> Let me confirm how dirstate should be treated at the border of
> transaction scope.
> - at opening transaction:
>    - write in-memory dirstate changes into '.hg/dirstate' (not yet)
> - at successfully closing transaction:
>    - write changes during transaction into '.hg/dirstate' (not yet)
>    - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK)
> - at aborting transaction:
>    - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK)
>    - discard in-memory dirstate changes (not yet)

This summary looks perfect. Any objection/issue with doing this (beside 
shelve abuse?)

>>>>> But current implementation doesn't invoke abort callbacks, if there is
>>>>> no change on repository data at aborting transaction: e.g. "hg import"
>>>>> fails to import the first patch.
>>>> The change to dirstate should most probably register himself as change
>>>> to the transaction. That would trigger it.
>>> I found two problems of 'addfilegenerator()' below. IMHO, the former
>>> is a major problem, and the latter is minor one (but you should
>>> dislike it as later :-)).
>> If addfilegenerator fails to fit our needs here, we should probably
>> hammer it so it does.
>>> And, both of my fixing plans cause omitting invocation of abort
>>> callbacks, because they makes 'transaction._backupentries' empty :-<
>>> - current 'addfilegenerator()' causes forcibly restoring
>>>     '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
>>>     parents of the working directory
>>>     This causes unexpected "hg status" output at rollback-ing in such
>>>     situation (aka non-"parentgone" case).
>> Do you mean there is category of change to dirstate that must not be
>> rolledback if transaction fails? I do not see data about this on the
>> wikipage (but I could be blind) Can you elaborate?
>> I'm not sure to understand the issue here.
> "rollback" above is not one by aborting while transaction running, but
> one by "hg rollback"/"repo.rollback()" after once closing transaction.
> In the later case, parents of the working directory may differ
> from one at opening transaction, and forcible restoring dirstate
> causes unexpected result of subsequent "hg status".
> Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be
> restored at rollback, and it causes forcible restoring.
> 'repo.rollback()' avoids this problem by restoring dirstate by itself.
>      https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110

I'm not sure about what this section (of the email) is about. Look like 
the linked code is close to what we need to invalidate the dirstate in 
memory content. But its late so I'm giving up with this section. I think 
we made good enough progress on the other ones.

>>>     I'm planning to introduce 'backup' optional argument, which controls
>>>     'addbackup()' invocation in '_generatefiles()' to avoid issue above,
>>>     to 'addfilegenerator()'.
>>>       https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267
>> That sound wrong to me. We either have change to be apply on transaction
>> success and rolledback on transaction failure. But I can't see case were
>> backup will be wrong.
>>>     According to discussion before, dirstate writes changes into (a)
>>>     '.hg/dirstate' at the beginning of transaction and (b)
>>>     '.hg/dirstate.pending' while transaction running. Then, discarding
>>>     the latter can work as a kind of restoring. So, omitting
>>>     'addbackup()' invocation doesn't cause problem at least for dirstate.
>>>     Or is another trick like "add to _backupentries, but omit backup
>>>     itself" better ?
>> The backup are also used for rollback (and hg recover). We must keep it.
>>> - current 'addfilegenerator()' causes forcibly writing data out at
>>>     'writepending()', even if data hasn't been changed since last
>>>     writing out
>>>     When external hooks are invoked while transaction, this causes
>>>     additional "transaction abort!" and "rollback completed" output,
>>>     even though dirstate hasn't been changed since the beginning of
>>>     transaction.
>>>     For example, pushing changes/bookmarks causes writing into
>>>     'dirstate.pending' at REMOTE side, even though it never uses and
>>>     changes dirstate.
>> If there is no change to the dirstate. Why are we recording any change
>> into the transaction at all?
> I understood that 'addfilegenerator()' is used for files which may be
> changed during transaction like "journal", and invoked it for dirstate
> at opening transaction.
> Should I use it after examination whether dirstate has been changed
> since opening transaction ?

The current semantic is:

   I've new content, here is how to generated it on disk.

So it should be called by the code updating dirstate to "commit" the 
change. It should replace (or be contained in) dirstate.write() calls.

(we should probably update the docstring to make it clean)

Pierre-Yves David

More information about the Mercurial-devel mailing list