[PATCH] transaction: support for callbacks during abort

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu Jul 16 09:43:11 CDT 2015


At Tue, 14 Jul 2015 13:26:14 +0200,
Pierre-Yves David wrote:
>
> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
> >
> >
> > At Sat, 11 Jul 2015 20:47:00 +0100,
> > Pierre-Yves David wrote:
> >
> >> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> >>> Now, I'm working for dirstate consistency around inside and border of
> >>> transaction, and planning to use this abort callback mechanism to know
> >>> the end of transaction at failure.
> >>
> >> I'm a bit curious about what code you need to run in such case. If you
> >> use the file generator mechanism you do not need to worries about
> >> restoring backup.
> >>
> >> However, I'm not sure how rolling back the file would atomatically
> >> invalide the in memory object content (probably not). Is this why you
> >> need a callback? could we (would it make sense to) integrate that as
> >> part of one of our current cache invalidation mechanism?
> >
> >
> > I'm planning to use callback mechanism for purposes below:
> >
> > 1. to make existing 'dirstate.write()' invocations write in-memory
> >     changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
> >     'dirstate.write()' should know whether transaction is running or not
> >
> >     I'm planning changes below for this purpose:
> >
> >     - invoke (newly added) 'dirstate.begintransaction()' at the
> >       beginning of transaction in 'localrepo.transaction()'
> >
> >     - invoke (newly added) 'dirstate.endtransaction()' at the end of
> >       transaction via "the end of transaction" callback (instead of
> >       finalize/abort callbacks)
> >
> >
> >     There are some other ways:
> >
> >     1-1. instantiate dirstate with 'repo'
> >          (or weak ref of repo to avoid circular reference ?)
> >
> >     1-2. make 'dirstate.write()' require 'localrepo' argument
> >
> >          This seems too large impact (especially for 3rd party
> >          extensions, even though Mercurial doesn't ensure compatibility
> >          of internal API at all :-))
> >
> >     In both cases, 'dirstate.write()' can determine appropriate output
> >     file by examination of 'repo.currenttransaction()'.
>
> I would prefer one of this options. Using hooks for such core code looks
> a bit hacky and fragile.

How about something new like 'unlock' of 'lock' (or 'after' of
transaction) ? Would you think this kind of 'call back' mechanism
itself is also fragile for core code (if newly added) ?


> I would be happy with wither 1-1 (weak-ref
> please) or 1-2. With a preference for 1-2.
>
> You can make the repo argument optional, with a devel warning if it is
> missing for example. To avoid too much breakage for a couple of version.

BTW, what is your strong reason to choose 1-2 ? to avoid using
weak ref of repo ?



> > 2. in-memory dirstate changes should be discarded at failure of
> >     transaction, to prevent them from being written out at (outer)
> >     'wlock.release' or so
> >
> >     I'm planning to do this via "the end of transaction" callback.
> >
> >
> >     Using 'dirstate.{begin|end}parentchange()' mechanism as below also
> >     works similarly.
> >
> >     - invoke beginparentchange at starting transaction
> >     - invoke endparentchange at (successfully) closing transaction
> >     - wlock.release at failure invokes 'dirstate.invalidate()'
> >
> >     But it seems too optimistic for outside transaction :-), and it
> >     can't ensure consistency just at the end of transaction.
>
> I'm a bit confused about why you would invoe dirstate.invalidate() at
> lock release. Can't we have something invalidating the dirstate content
> when transaction if rolledback (so that dirstate is read from disk?

(at first, 'dirstate.{begin|end}parentchange()' example above is
just a example to do similar thing in another way, but not what I
want to do)

> Do we have something forcing dirstate flushing at the transaction
> openning? (so that the backup we restore is okay?)

AFAIK, with current implementation:

- in-memory dirstate changes aren't explicitly written out at opening
  transaction

  In almost all cases, in-memory dirstate changes seem to be
  INDIRECTLY written out before transaction opening. But it isn't
  explicitly ensured.

  This causes:

  1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at
     opening transaction, and

  2. restoring one from (incorrect) '.hg/undo.dirstate', which is
     renamed from '.hg/journal.dirstate' above, at rollback

  (Oops, I have overlooked that journal file is created not from
  dirstate instance but from '.hg/dirstate' file at opening
  transaction :-< we have to fix this at first)

  For example, at "hg backout --commit", dirstate changes via
  reverting aren't written out until 'ctx.markcommitted()'
  after closing the transaction. And, "hg status" after "hg
  rollback" shows unexpected status for added/removed files.

  BTW, after my previous change fe03f522dda9, this issue can be
  reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing
  on some unsure files :-)

    https://selenic.com/repo/hg/rev/fe03f522dda9

- in-memory dirstate changes aren't explicitly discarded at
  aborting transaction

  (for convenience, let's assume that '.hg/journal.dirstate' is
  correct, at this point)

  As you described, aborting transaction restores '.hg/dirstate' from
  '.hg/journal.dirstate' saved at opening transaction.

  But in-memory changes during transaction are still kept in
  'dirstate' instance, and they may be written into already restored
  '.hg/dirstate' at 'wlock.release()' or so.

  BTW, commands almost fully enclosed by transaction treat "in-memory
  changes during aborted transaction" in ways below:

  - in-memory changes are discarded by 'dirstateguard' OUTSIDE
    transaction scope
    ('hg qpush', 'hg commit --amend', and 'hg import')

    "discarding in-memory dirstate changes at aborting
    transaction" will make dirstateguard in these cases
    useless.

  - in-memory changes should be discarded, but not
    ('hg transplant')

    Aborting for other than conflict may cause unexpected dirstate
    (e.g. referring rollbacked revision as parent). This should be
    fixed.

  - discarding in-memory changes causes problem
    ('hg shelve' and 'hg unshelve')

    Current shelve/unshelve implementation expects aborting
    transaction NOT to discard in-memory dirstate changes intentionally
    or unintentionally.

    1. start transaction
    2. do shelve/unshelve, and this makes temporary revisions
    3. update working directory to the parent revision at (1)
       (+ some additional actions)
       - dirstate is different from one at (1), because
         some changes are shelved/unshelved
    4. abort transaction intentionally
       - rollback revisions created at (2)
       - restore '.hg/dirstate' from one at (1)
    5. write dirstate at 'wlock.release'
       - here, '.hg/dirstate' is equal to one at (3)

    IMHO, "strip by aborting current transaction" at (4) above
    should be replaced by "strip by rollbacking the transaction
    after once closing it".

    The latter doesn't strip the parent of the working directory at
    stripping because of updating at (3): this is non-"parentgone"
    case. Then, dirstate is kept as same as one at (3) above.

Let me confirm how dirstate should be treated at the border of
transaction scope.

- at opening transaction:

  - write in-memory dirstate changes into '.hg/dirstate' (not yet)

- at successfully closing transaction:

  - write changes during transaction into '.hg/dirstate' (not yet)
  - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK)

- at aborting transaction:

  - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK)
  - discard in-memory dirstate changes (not yet)


> >>> But current implementation doesn't invoke abort callbacks, if there is
> >>> no change on repository data at aborting transaction: e.g. "hg import"
> >>> fails to import the first patch.
> >>
> >> The change to dirstate should most probably register himself as change
> >> to the transaction. That would trigger it.
> >
> > I found two problems of 'addfilegenerator()' below. IMHO, the former
> > is a major problem, and the latter is minor one (but you should
> > dislike it as later :-)).
>
> If addfilegenerator fails to fit our needs here, we should probably
> hammer it so it does.
>
> > And, both of my fixing plans cause omitting invocation of abort
> > callbacks, because they makes 'transaction._backupentries' empty :-<
> >
> > - current 'addfilegenerator()' causes forcibly restoring
> >    '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
> >    parents of the working directory
> >
> >    This causes unexpected "hg status" output at rollback-ing in such
> >    situation (aka non-"parentgone" case).
>
> Do you mean there is category of change to dirstate that must not be
> rolledback if transaction fails? I do not see data about this on the
> wikipage (but I could be blind) Can you elaborate?
>
> I'm not sure to understand the issue here.

"rollback" above is not one by aborting while transaction running, but
one by "hg rollback"/"repo.rollback()" after once closing transaction.

In the later case, parents of the working directory may differ
from one at opening transaction, and forcible restoring dirstate
causes unexpected result of subsequent "hg status".

Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be
restored at rollback, and it causes forcible restoring.

'repo.rollback()' avoids this problem by restoring dirstate by itself.

    https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110


> >    I'm planning to introduce 'backup' optional argument, which controls
> >    'addbackup()' invocation in '_generatefiles()' to avoid issue above,
> >    to 'addfilegenerator()'.
> >
> >      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267
>
> That sound wrong to me. We either have change to be apply on transaction
> success and rolledback on transaction failure. But I can't see case were
> backup will be wrong.
>
>
> >    According to discussion before, dirstate writes changes into (a)
> >    '.hg/dirstate' at the beginning of transaction and (b)
> >    '.hg/dirstate.pending' while transaction running. Then, discarding
> >    the latter can work as a kind of restoring. So, omitting
> >    'addbackup()' invocation doesn't cause problem at least for dirstate.
> >
> >    Or is another trick like "add to _backupentries, but omit backup
> >    itself" better ?
>
> The backup are also used for rollback (and hg recover). We must keep it.
>
> >
> > - current 'addfilegenerator()' causes forcibly writing data out at
> >    'writepending()', even if data hasn't been changed since last
> >    writing out
> >
> >    When external hooks are invoked while transaction, this causes
> >    additional "transaction abort!" and "rollback completed" output,
> >    even though dirstate hasn't been changed since the beginning of
> >    transaction.
> >
> >    For example, pushing changes/bookmarks causes writing into
> >    'dirstate.pending' at REMOTE side, even though it never uses and
> >    changes dirstate.
>
> If there is no change to the dirstate. Why are we recording any change
> into the transaction at all?

I understood that 'addfilegenerator()' is used for files which may be
changed during transaction like "journal", and invoked it for dirstate
at opening transaction.

Should I use it after examination whether dirstate has been changed
since opening transaction ?


> >    I'm planning to introduce 'checkdirty' optional argument, which is
> >    used to examine whether data has been changed since last writing
> >    out (= 'genfunc' should be invoked or not), to avoid redundant
> >    writing data out.
>
> Yes, this way of detecting dirtyness have been bothering me in the past.
> I've a small series to clean this us, let me undust it and patchbomb it.
>
> >>> So, there is no easy and certain way to know the end of transaction at
> >>> failure.
> >>>
> >>> BTW, this also causes that 'txnabort' hook isn't invoked at aborting
> >>> transaction in such case, because 'txnabort' hook invocation is
> >>> achieved by abort callback mechanism. This may need some cleanup for
> >>> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
> >>> will be invoked subsequently.
> >>>
> >>> Then, please let me confirm some points below before changing around
> >>> 'transaction._abort()' for my purpose:
> >>>
> >>>     - are there any usecases to expect actual changes to be rollbacked ?
> >>>
> >>>       In other words, does it cause problem to invoke abort callbacks
> >>>       regardless of actual changes ?
> >>
> >> If I remember correctly, it make "transaction abort" "abort complete
> >> message to happears in a lot of new place. Basically any place where we
> >> open a transaction but eventually do nothing. We can probably not change it
> >
> > Adding invocation of abort callbacks just before cleaning journal
> > files up can avoid such additional messages, even though duplicated
> > code paths are redundant :-)
> >
> >      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477
> >
> >      diff --git a/mercurial/transaction.py b/mercurial/transaction.py
> >      --- a/mercurial/transaction.py
> >      +++ b/mercurial/transaction.py
> >      @@ -482,6 +482,8 @@ class transaction(object):
> >
> >               try:
> >                   if not self.entries and not self._backupentries:
> >      +                for cat in sorted(self._abortcallback):
> >      +                    self._abortcallback[cat](self)
> >                       if self.journal:
> >                           self.opener.unlink(self.journal)
> >                       if self._backupjournal:
>
> I do not understand what we are trying to achieve here. Having a wider
> definition of abort (abort apply even if transaction is empty) seems a
> dubious semantic and will have huge impact in the UI.
>
>
> >
> >
> >>>     - are there any usecases to refer changes to be rollbacked in abort
> >>>       callbacks ?
> >>>
> >>>       In other words, does it cause problem to invoke abort callbacks
> >>>       after rollbacking ?
> >>
> >> I would says yes, but detail could be complicated (to be continued)
> >>
> >>>
> >>>       BTW, 'txnabort' hooks are invoked without 'pending', and it means
> >>>       that such changes should be invisible to 'txnabort' hooks, doesn't
> >>>       it ? > Pierre-Yves
> >>
> >> And here we have proof of (1) people do not rely of it too much (2)
> >> definition of the state of the transaction during abort is far from trivial.
> >> We should probably keep it with rolled back content/
> >
> > It means that 'txnabort' hooks should be invoked with 'pending' (=
> > rolled back content should be visible to them), doen't it ? (sorry for
> > my poor English, if I misunderstood :-<)
>
> No. It means that:
>
> 1) Nobody complains about the lack of pending so far (since we do not
> have it).
>
> 2) I'm not sure we can have anything sensible done here (reading pending
> with a transaction half applied, will be bad)
>
> Conclusion: we should keep not having pending there.

OK, I see.


> >>>       Then, IMHO, such changes should be invisible to abort callbacks,
> >>>       too.
> >>>
> >>> Maybe, should I introduce another callback to know the end of
> >>> transaction (regardless of the result of it, or only at failure) ?
> >>
> >> Sounds like an awesome idea. Python have:
> >>
> >> try:
> >>       .... # equivalent to transaction open
> >> except:
> >>       .... # equivalent to transaction abort
> >> else:
> >>       .... # equivalent to transaction success
> >> finally:
> >>       .... # equivalent of your proposal
> >>
> >> Finally is definitely very useful, so go for it.
> >
> > I'll do so.
>
> Cool, (but It is not clear it is on the critical path, so no rush).
>
> --
> Pierre-Yves David
>


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list