[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Tue May 26 13:14:56 CDT 2015


At Sun, 24 May 2015 14:16:21 -0700,
Pierre-Yves David wrote:
> 
> On 05/23/2015 03:08 PM, FUJIWARA Katsunori wrote:
> >
> > At Fri, 22 May 2015 11:57:50 -0500,
> > Pierre-Yves David wrote:
> >>
> >> On 05/22/2015 11:34 AM, FUJIWARA Katsunori wrote:
> >>>
> >>> At Thu, 21 May 2015 16:59:47 -0500,
> >>> Pierre-Yves David wrote:
> >>>>
> >>>> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:

(snip)

> >>> I chose `addpending()` instead of `addfilegenerator()` to write
> >>> dirstate pending file out, because assumption below of the latter
> >>> seems not suitable for dirstate characteristic:
> >>>
> >>>     - changes will be written out at successful `transaction.close()`
> >>>     - pending file will be discarded at transaction failure
> >>>
> >>> As described in patch #4 of this series, dirstate changes should be
> >>> written out (at releasing wlock) regardless of outcome of transaction
> >>> in almost all cases.
> >>
> >> This assumption are a surprise to me. Can you explain why we do not want
> >> change to the dirstate that at "done withing a transaction" to be
> >> rollbacked during that transaction.
> >>
> >> Practical example will probably help to transfer your knowledge in my
> >> direction :)
> >
> > I categorized cases below:
> >
> >   1. changing dirstate inside transaction:
> >
> >      - qpush
> >      - amend
> >      - import
> >        now, dirstate changes inside transaction are explicitly
> >        discarded at failure by `dirstateguard`
> >
> >        => OK, we can use `addfilegenerator()` in this case !
> >
> >      - transplant
> >      - unshelve
> >        dirstate is always written out at releasing wlock, and:
> >
> >        - for InterventionRequired or TransplantError, transaction is
> >          manually `close`-ed
> >        - for other exceptions, transaction is rollback-ed
> >
> >        => Can we omit writing dirstate changes out at failure (= the
> >           latter above) in this case ?
> >
> >           IMHO, probably "yes", because current dirstate parents
> >           already become "unknown" by rollback-ing changelog.
> 
> Okay! Sounds like:
> 
> - We agree on using addfilegenerator in these cases.
> - This cases are defined by "dirstate change happen inside transaction.
> 
> What exactly define "dirstate change"? Call to write? opening a dirstate 
> guard? There is some stuff in (2) that makes me doubt of the second 
> point (how easy defined they are)
>
> >   2. changing dirstate inside wlock, but outside (= before) transaction:
> 
> If the dirstate change happen -before- the transaction. Why is it now 
> written out before the transaction is open and why would we want to 
> rollback such change if it is still valid (since it point to stuff 
> outside the transaction?)
>
> Are we writing the dirstate at wlock release only? And if so, why? Is it 
> a general principle as an attempt to prevent it to be visible while 
> pointing to in-transaction content (addressed but (1) then). Is it an 
> attempt to avoid issue related to transaction rollback? (also addressed 
> by (1) ?).

I don't have strong insistence on avoiding dirstate being written out
itself.

I've just thought that let's keep current delaying dirstate being
written out until release of wlock, if it is intentional (e.g for
performance reason in large working directory and so on).



> I feel like that if the dirstate is still valid after an operation 
> fails, it is the responsibility of the command logic to restore it to a 
> previous state if necessary.
> 
> If I missed any case where it make sense (in absolute, not as currently 
> implemented) to wait for wlock release before writing, please details.
> 
> >      => We should keep changes regardless of outcome of transaction in
> >         this case for subsequent operations, like `--continue` or so
> >
> >         (e.g. `repo.commit()` may fail for pretxncommit hook detecting
> >         violation of team specific rules)
> >
> >         This means that changes held in `.pending` are still needed
> >         even if transaction is rollback-ed.
> >
> >      - backout
> >      - graft
> >      - histedit
> >      - rebase (*1)
> >
> >        (*1) Strictly speaking, `dirstateguard` saves changes before
> >             `repo.commit()`, so we can use `addfilegenerator()` for
> >             rebase. But let's forget this to make the problem simple :-)
> 
> I'm a bit confused here. These operation likely do multiple dirstate 
> modification.
> 
> Let's look at my expectation for graft:
> 
>    graft():
>      for each revs:
>        merge():
>          dirstate.change('all new file and content') #1
>          if conflict:
>            dirstate.write()
>            raise InterventionRequired → exit
>        docommit():
>          tr:
>            node = createcommit()
>            dirstate.update(node) #2
>
> Is the first one really outside its a transaction or is graft wrapping 
> all whole iteration of the loop in its own transaction
> The second one seems to fall in (case 1).
> 
> Let's not look at my expectation for rebase:
> 
>    rebase():
>      update(destination):
>        dirstate.update(destination) #1
>      for each revs:
>        merge():
>          dirstate.change('all new file and content') #2
>          if conflict:
>            dirstate.write()
>            raise InterventionRequired → exit
>        docommit():
>          tr:
>            node = createcommit()
>            dirstate.update(node) #3
> 
> The first one must obviously be preserved. It will be rolled back by `hg 
> rebase --abort` if needed.
> #2 and #3 are similar to graft.
> 
> (at this point, you likely explain me that I know nothing about how 
> dirstate works :-) )

You're right, and I'm not :-<

My implementation can't treat failure around `dirstate.update()` (=
`committablectx.markcommitted()`) at #3 for rebase correctly.

   rebase():
     update(destination):
       dirstate.update(destination) #1
     for each revs:
       merge():
         dirstate.change('all new file and content') #2
         if conflict:
           dirstate.write()
           raise InterventionRequired → exit
       docommit():
         tr:
           node = createcommit()
           hook('pretxncommit') <<<<
           dirstate.update(node) #3
           hook('pretxnclose') <<<< 

To make `--continue` work as expected:

  - dirstate should be kept, if transaction is aborted at (or before)
    pretxncommit hook above

  - changing at #3 above should be discarded, if transaction is
    aborted at (or after) pretxnclose hook above

Then, let me categorize cases again. "touching file status" and
"updating to new parents" below are corresponded to #2 and #3 in
rebase.

(strictly speaking, #3 for rebase implies also "touching file status",
but its main purpose should be "updating to new parents")

  - both of "touching file status" and "updating to new parents" occur
    inside transaction

    - amend
    - import
    - qpush
    - transplant
    - unshelve

    In this case, all changes on dirstate should be discarded at
    failure of transaction, because discarding newly added revisions
    makes recent dirstate (especially, parents of it) meaningless.

    Aborting by InterventionRequired or TransplantError is treated as
    "intentional failure" (= a kind of success), because transaction
    is manually `close`-ed at that time.

  - "updating to new parents" occurs inside transaction, but "touching
    file status" occurs outside

    - backout
    - graft
    - histedit
    - rebase

    In this case, for subsequent `--continue` (or manually committing
    for backout), "touching file status" must be kept, even at failure
    of transaction, but "updating to new parents" must not.


If writing dirstate changes out at other than `wlock.release` is
reasonable enough, this can be simplified as:

  write dirstate changes out at opening and closing the transaction,
  but restore it from saved one at unexpected failure of transaction


Then, how about this ?

  1. write in-memory changes at the beginning of the transaction, to
     restore it at failure of transaction easily

     for visibility to external hooks, just before "pretxnopen" seems
     reasonable.

  2. use `addfilegenerator()` to make in-memory changes inside
     transaction visible to external hooks

  3. in-memory changes inside transaction should be written out into
     `dirstate.pending`, even if `dirstate.write()` is explicitly
     invoked

     While transaction, `.hg/dirstate` can be kept as one at (1) above
     by (2) and (3).

  4. `dirstate.pending` (= changes inside transaction) should be:

     - renamed to `dirstate` at success of transaction

     - removed at failure of transaction
       (this is equivalent to restoring dirstate to one at (1) above)


This also does:

  - avoid inconsistency of dirstate at failure by other than
    InterventionRequired(unshelve) or TransplantError(transplant)

  - make `dirstateguard` for qpush, amend and import useless 

    restoring dirstate at failure should be done as a part of aborting
    transaction



> Writing down these two examples seems point at:
> 
>    Sometimes there is a transaction open, and we touch the dirstate to 
> record changes to working copy files. This must not be rolled back with 
> the transaction
> 
>    Sometimes there is a transaction open and we update the dirstate to 
> point at new parents. But this parent are not "new stuff in the 
> transaction" (actually, the transaction is open, but nothing happened in 
> it yet). This must not be rolled bad with the transaction
> 
> Am I right? If I'm right, this probably mean case (1) is defined by 
> "dirstate parents point inside the transaction" which will probably be a 
> bit more tricky to get right.
> 
> > IMHO, we should mainly think about (2) case above.
> >
> > It is fact that invoking `dirstate.write()` explicitly before
> > `repo.commit()` or so makes (2) case suitable for use of
> > `addfilegenerator()`. In other words, "let's use clean dirstate inside
> > transaction !".
> 
> I'm not sure what you mean here, but the previous round of question 
> probably cover that.
> 
> > But on the other hand, it is a little inefficient, because dirstate
> > changes are written out at each continuous `repo.commit()` invocations
> > (while graft, histedit and rebase), even if external process is never
> > invoked.
> 
> What is the cost of writing this dirstate change? I feel like all the 
> other operation done by these command dominate the dirstate writing.
> 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list