[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Mon Jun 22 11:19:34 CDT 2015

At Fri, 12 Jun 2015 02:55:04 +0900,
FUJIWARA Katsunori wrote:
> At Wed, 10 Jun 2015 11:32:25 -0700,
> Pierre-Yves David wrote:
> > 
> > 
> > 
> > On 06/10/2015 09:39 AM, Martin von Zweigbergk wrote:
> > >
> > >
> > > On Tue, Jun 9, 2015 at 12:58 PM Pierre-Yves David
> > > <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> > > wrote:


> > >     Wait, is this series directly related with the dirstate
> > >     guard/transaction thing ? (as in, the next step we must do) or is it an
> > >     unrelated fix to dirstate.
> > >
> > >
> > > I think it is related, but I'll let Fujiwara confirm.
> > >
> > >     If it is unrelated, we should probably focus on fixing the dirstate
> > >     write scheduling before working on this issue.
> > >
> > > And what do you recommend if it is related?
> > 
> > I've to admit I stopped following your discussion closely after roudn 
> > trip. If this is on the critical path for dirstateguard, we should 
> > probably start building a solid repro before taking care of it.
> At first, I thought that this issue can be fixed fully separately from
> fixing issue4378 or so.
> But for total consistency of behavior at failure, we should make
> consensus about handling dirstate changes generally, before working
> more about issue4378. At least, deciding direction can avoid
> meaningless side trip or so for me :-)
> I'll post summary of issue4378 and this issue from the same point of
> view as a basis for discussion.

Sorry for late response.

I summarized about "dirstate changes at failure" as a basis for
discussion. Please comment !

BTW, it is assumed that `dirstate.normallookup` is invoked at
`repo.wwrite` or so, because, without these changes, problems like
issue4583 ("changed files are treated as clean") appear occasionally,
even if processing terminates successfully (e.g, at `hg revert`, `hg
import` and so on):


1. `hg commit`

Let's see typical case `hg commit`.

Lines ending with "#N" means "dirstate is changed here". Scope of
wlock/transaction is described in "with xxxx as yyyy:" style for

      if opts['addremove']:
          dirstate.update('files {added|removed|copied}') #1
          # involving dirstate.write() at wlock.release()

        with repo.wlock() as wlock:
          repo.status('to get current status') #2

          editor('for commit message')
          wctx.sub(s).commit('recursively') for subs

          with repo.transaction('commit') as tr:
            node = createcommit()
            dirstate.update(node) #3

Changes at #3 above should be discarded for consistency of subsequent
`hg status` and so on, because:

  - `dirstate.parents` refers rolled back (= not-existing) revision
  - files are already `dirstate.normal`-ed

This is reason why (1) dirstate should be saved just before starting
transaction, and (2) restored if transaction is rolled back.

This was already discussed for "making in-memory dirstate changes
visible to external process" to fix issue4378.


Then, how should we treat #1 and #2 ?

  1. restore dirstate at failure of command

     This policy isn't suitable for `hg commit --addremove` case,
     because dirstate changes at #1 above should be kept for backward
     compatibility, even if `hg commit` itself fails.

     At least, the way to avoid restoring dirstate at failure (a kind
     of the inverse of `dirstateguard` ?) is required for this policy.

     For example, commands below expect that dirstate changes are kept
     even if command itself is terminated with non-0 exit code
     (detailed flows are shown in subsequent section):

       - aborting by exception
         `hg graft`, `hg rebase` and so on

       - terminating normally with non-0 exit code
         conflicts at `hg backout` without --merge

  2. restore dirstate at failure in wlock scope

     Changes at #2 (just `normal`-ing) can be discarded at failure in
     wlock scope.

     This policy is suitable at least for `hg commit` case.

     But some other commands expect that dirstate changes are kept
     even if command itself is aborted by exception, as described

     So, the way to avoid restoring dirstate at failure is required
     for this policy, too. In addition to it, code path enclosed by
     `repo.wlock()` + `wlock.release` may be nested at runtime, and
     this may increase complexity.

  3. keep dirstate changes (and write them out) even at failure

     This is also suitable for `hg commit` case, because this is as
     same as current behavior :-)

Let's see other cases, too.

2. `hg update`

        with repo.wlock() as wlock:
          wctx.files() / wctx.dirty() #1


          # critical region >>>>
          repo.wwrite('all files to be updated')
          wctx.sub(s).get('update recursively') for subs
          dirstate.change('all updated files') #2
          # critical region <<<<


Changes at #1 above can be treated as similarly as ones at #2 in `hg
commit` case. So, let's ignore it here :-)

Aborting in "critical region" above causes the issue that changed
files are treated as clean, as I described in previous post.


There are some solutions to fix this problem:

  1. restore dirstate at failure of command

  2. restore dirstate at failure in wlock scope

  3. enclose "critical region" by `dirstateguard` to restore dirstate
     at failure (but keep other dirstate changes even at failure)

  4. apply `dirstate.normallookup` or so explicitly on files to be
     updated before entering "critical region" (and keep dirstate
     changes even at failure)

     This is needed, even if `repo.wwrite` involves
     `dirstate.normallookup`, because `repo.wwrite` itself may be
     invoked in FORKED processes.

In fact, when `partial` argument of `merge.update` is specified
explicitly, (4) is needed, even if one of other than (4) is applied,
because updating dirstate (at #2 above) is omitted in such case (this
is used as a kind of "revert").

Then, (4) seems suitable for "critical region" in `hg update`.

3. Users of `merge.update`

There are some cases using `merge.update` directly or indirectly
(histedit and so on, too)

      with repo.wlock() as wlock:
        if linearmerging:
          # critical region >>>>
          merge.update('for merging linearly') #1
          repo.setparents() #2
          # critical region <<<<

          if conflicted:
            return 1 # expecting dirstate.write
          elif not commit:
            return 0 # expecting dirstate.write
          merge.update('to the revision to be backed out') #3
          cmduit.revert('to backout') #4

        repo.commit('the revision backing out') #5

         if merging:
           merge.update('to the original parent') #6
           merge.update('for merging with the revision backing out') #7

      with repo.wlock() as wlock:
        for rev in revs:
          merge.graft('to graft rev'):
            # critical region >>>>
            merge.update('for pseudo merging with graft source') #1
            repo.setparents() #2
            dirstate.copy('to duplicate copy information') #3
            # critical region <<<<

          if conflict:
            throw abort() # expecting dirstate.write

          repo.commit('the revision grafted') #4
    (BTW, `merge.update` isn't yet included into "critical region" of
    `hg graft`)

      with repo.wlock() as wlock:
        merge.update('to destination') #1

        for rev in revs:
          # critical region >>>>
          merge.update('for merging the revision to be rebased') #2
          dirstate.copy('to duplicate copy information') #3
          # critical region <<<<

          if conflict:
            raise InterventionRequired # expecting dirstate.write

          with dirstateguard('rebase'):
            repo.setparents('to drop 2nd parent of pseudo merge') #4
            repo.commit('the revision rebased') #5
    (BTW, "critical region" of `hg rebase` isn't yet guarded at all)

We can find something below out:

  a. `merge.update` may be invoked multiple times in a wlock scope

     This also means that "critical region" may be also invoked
     multiple times in a wlock scope, and increases complexity of "a
     kind of the inverse of `dirstateguard` or so".

  b. additional processing may be needed after `merge.update` to use
     result of it correctly

     Result of each commands isn't valid, if a command is aborted in
     "critical region". For example, lack of `repo.setparents()` at #2
     of `hg graft` may cause committing invalid revision at subsequent
     `hg graft --continue`.

     So, dirstate changes in "critical region" should be discarded at

Then, enclosing "critical region" by `dirstateguard` seems easier than
"a kind of the inverse of `dirstateguard` or so" to restore dirstate
at failure.

4. Conclusion

According to cases above, "restore dirstate at failure of command (or
in wlock scope)" policy doesn't seem suitable for current Mercurial

Then, what about the policy below ?

  1. treat dirstate changes in transaction as "all or nothing"

     - save dirstate just before starting transaction, and
     - restore it if transaction is rolled back

  2. keep dirstate changes outside transaction (and write them out)
     even at failure

  3. enclose "critical region" by `dirstateguard` to discard dirstate
     changes at failure

  4. but in `merge.update`, apply `dirstate.normallookup` explicitly
     on files to be updated before entering into "critical region"
     instead of enclosing it by `dirstateguard` because:

     - `repo.wwrite` in FORKED processes doesn't write
       `normallookup`-ed result into `.hg/dirstate`, and

     - explicit `partial` argument of `merge.update` always omits
       updating dirstate

[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp

More information about the Mercurial-devel mailing list