/!\ This page is primarily intended for Mercurial's developers.

Dirstate Transaction Plan

Status: Completed

Main proponents: KatsunoriFujiwara

This page mainly focuses on how dirstate should be treated for consistency, especially around scope boundary of transaction and wlock at failure.

1. Summary

Tasks that we should do:

2. Failure in transaction scope

from:

2.1. `hg commit` case

Let's see typical case `hg commit` with pseudo-code.

Lines ending with "#N" means "dirstate is changed here". Others may
cause aborting. Scope of wlock/transaction is described in "with xxxx
as yyyy:" style for convenience::

  commands.commit():
    if opts['addremove']:
      scmutil.addremove():
        dirstate.update('files {added|removed|copied}') #1
        # involving dirstate.write() at wlock.release()
  
    repo.commit():
      with repo.wlock() as wlock:
        repo.status('to get current status') #2
  
        editor('for commit message')
        wctx.sub(s).commit('recursively') for subs
        hook('precommit')
  
        with repo.transaction('commit') as tr:
          node = createcommit()
          hook('pretxncommit')
          dirstate.change('to mark as committed') #3
          hook('pretxnclose')

Changes at #3 above should be discarded at failure after it for
consistency of subsequent `hg status` and so on, because:

- `dirstate.parents` refers rolled back (= not-existing) revision
- files are already `dirstate.normal`-ed

But in some cases, changes at #1 and/or #2 should be kept.

2.2. `hg rebase` case

Let's see `hg rebase` case::

  rebase():
    with repo.wlock() as wlock:
      merge.update('to rebase destination'):
        dirstate.update(destination) #1
      for each revs:
        merge.update('to merge from rebased one'):
          dirstate.change('all new file and content') #2
        if conflict:
          raise InterventionRequired # expecting dirstate.write
        repo.commit():
          (snip)
          with repo.transaction() as tr:
            node = createcommit()
            hook('pretxncommit')
            dirstate.change('to mark as committed') #3
            hook('pretxnclose')

To make `--continue` work as expected, dirstate changes should be
kept, if transaction is aborted at (or before) pretxncommit hook
above.

2.3. Categorize cases

Then, let me categorize cases. "touching file status" and "updating to
new parents" below are corresponded to #2 and #3 in rebase case.

(strictly speaking, #3 in rebase case implies also "touching file
status", but its main purpose should be "updating to new parents")

- both of "touching file status" and "updating to new parents" occur
  inside transaction

  - amend commit
  - import
  - qpush
  - transplant
  - unshelve

  In this case, all changes on dirstate should be discarded at
  failure of transaction, because discarding newly added revisions
  makes recent dirstate (especially, parents of it) meaningless.

  Aborting by InterventionRequired or TransplantError is treated as
  "intentional failure" (= a kind of success), because transaction
  is manually `close`-ed at that time.

- "updating to new parents" occurs inside transaction, but "touching
  file status" occurs outside

  - backout
  - graft
  - histedit
  - rebase

  In this case, for subsequent `--continue` (or manually committing
  for backout), "touching file status" must be kept, even at failure
  of transaction, but "updating to new parents" must not.


If writing dirstate changes out at other than `wlock.release` is
reasonable enough, this can be simplified as:

  write dirstate changes out at opening and closing the transaction,
  but restore it from saved one at unexpected failure of transaction

2.4. Conclusion about failure in transaction scope

1. write in-memory changes at the beginning of the transaction, to
   restore it at failure of transaction easily

   for visibility to external hooks, just before "pretxnopen" seems
   reasonable.

2. use `addfilegenerator()` to make in-memory changes inside
   transaction visible to external hooks

3. in-memory changes inside transaction should be written out into
   `dirstate.pending`, even if `dirstate.write()` is explicitly
   invoked

   While transaction, `.hg/dirstate` can be kept as one at (1) above
   by (2) and (3).

4. `dirstate.pending` (= changes inside transaction) should be:

   - renamed to `dirstate` at success of transaction

   - removed at failure of transaction
     (this is equivalent to restoring dirstate to one at (1) above)


This policy can also:

- avoid that inconsistent dirstate is written out at failure

  e.g. aborted by pretxnclose hook at #3 in rebase case

- make `dirstateguard` for qpush, amend and import useless 

  restoring dirstate at failure should be done as a part of aborting
  transaction

3. Ambiguous timestamp issue

from:

Tasks other than #Finding buggy `localrepo.wwrite()` invocation were done by the series below:

3.1. Outline of issue

To detect change of a file without redundant comparison of file
content, dirstate recognizes a file as certainly clean, if:

1. it is already known as "normal",
2. dirstate entry for it has valid (= not "-1") timestamp, and
3. mode, size and timestamp of it on the filesystem are as same as
   ones expected in dirstate

This works as expected in many cases, but doesn't in the corner case
that changing a file keeps mode, size and timestamp of it on the
filesystem.

The timetable below shows steps in typical issue case::

  ---- ----------------------------------- ----------------
                                           timestamp of "f"
                                           ----------------
                                           dirstate    file
  time          action                     mem  file   system
  ---- ----------------------------------- ---- ----- -----
  N                                              -1    ***
       - change file "f"                                N
  
       - execute `hg foobar`
         - get wlock
         - instanciate dirstate             -1   -1
         - `dirstate.normal("f")`           N    -1
           (e.g. via dirty check)
         - overwrite "f", but keep size                 N
  N+1
         - release wlock
           - `dirstate.write()`             N    N
  
       - `hg status` shows "f" as "clean"   N    N      N
  ---- ----------------------------------- ---- ----- -----

The most important point is that `dirstate.write()` is executed at N+1
or later. This causes writing dirstate timestamp N of "f" out
successfully. If it is executed at N, `parsers.pack_dirstate()`
replaces timestamp N with "-1" before actual writing dirstate out.

Occasional test failure for unexpected file status is typical example
of this corner case. Batch execution with small working directory is
finished in no time, and rarely satisfies condition (2) above.

If `dirstate.normallookup()` or so is certainly applied on "f" before
releasing wlock, this issue doesn't occur. But there are some cases
omitting `dirstate.normallookup` for changed files:

- `hg revert --rev REV` (intentionally)
- `hg import` using `patch.patch()` (intentionally)
- `merge.update()` with explicit `partial` (intentionally)
- failure of `merge.update()` before `merge.rrecordupdates()` (unintentionally)

3.2. Fixing issue

The root cause of this issue seems that nested wlock scopes prevents
`wlock.release()` in `workingctx._checklookup()` from writing dirstate
changes out::

        # update dirstate for files that are actually clean
        if fixup:
            try:
                # updating the dirstate is optional
                # so we don't wait on the lock
                # wlock can invalidate the dirstate, so cache normal _after_
                # taking the lock
                wlock = self._repo.wlock(False)
                normal = self._repo.dirstate.normal
                try:
                    for f in fixup:
                        normal(f)
                finally:
                    wlock.release()
            except error.LockError:
                pass

`workingctx._checklookup()` is invoked via `repo.status()` or so.

Then, putting explicit `dirstate.write()` at the end of wlock scope
above can fix this issue::

  ---- ----------------------------------- ----------------
                                           timestamp of "f"
                                           ----------------
                                           dirstate    file
  time          action                     mem  file   system
  ---- ----------------------------------- ---- ----- -----
  N                                              -1    ***
       - change file "f"                                N
  
       - execute `hg foobar`
         - get wlock
         - instanciate dirstate             -1   -1
         - `dirstate.normal("f")`           N    -1
           (e.g. via dirty check)
       ----------------------------------- ---- ----- -----
         - `dirstate.write()`               -1   -1
       ----------------------------------- ---- ----- -----
         - overwrite "f", but keep size                 N
  N+1
         - release wlock
           - `dirstate.write()`             -1   -1
  
       - `hg status` shows "f" as "clean"   -1   -1     N
  ---- ----------------------------------- ---- ----- -----

When `dirstate.write()` is executed at N+1, overwriting "f" causes
timestamp N+1 of "f" on the filesystem, and timestamp N of "f" in
dirstate doesn't cause this issue.

3.3. Additional topics

3.3.1. Issue at `hg commit -i`

(newly added at writing this page, as a memo)

`cmdutil.dorecord()` used via `hg commit -i` cuases similar issue,
even if `dirstate.write()` is explicitly invoked in
`workingctx._checklookup()`, because `cmdutil.dorecord()` does:

0. (it is assumed that timestamp of file on the filesystem is N)
1. backup file, which may be committed partially, with keeping timestamp
2. select diff hunks of current changes
3. revert file to be partially committed by `merge.update`
4. apply selected hunks on reverted file
5. commit changes (= mark committed file as "clean")
6. write dirstate changes into .hg/dirstate
6. restore file from backup at (1)
7. restore timestamp of file from backup at (1)

Subsequent `hg status` doesn't recognize partially committed file as
modified, if:

- mode and size of the file isn't changed at (0) and (4)
- (1), (2), (3) and (4) are executed at timestamp N, but
- (6) is executed at N+1 or later

3.3.2. Finding buggy `localrepo.wwrite()` invocation

from:

Adding the following two lines to the top of `localrepo.wwrite()`
seems helpful for finding bugs::

  if self.dirstate._dirty:
      util.develwarn("writing file to disk while dirstate is dirty")

4. Failure in wlock scope

from:

4.1. `hg commit` case

Let's see typical `hg commit` case with pseudo-code.

  commands.commit():
    if opts['addremove']:
      scmutil.addremove():
        dirstate.update('files {added|removed|copied}') #1
        # involving dirstate.write() at wlock.release()
  
    repo.commit():
      with repo.wlock() as wlock:
        repo.status('to get current status') #2
  
        editor('for commit message')
        wctx.sub(s).commit('recursively') for subs
        hook('precommit')
  
        with repo.transaction('commit') as tr:
          node = createcommit()
          hook('pretxncommit')
          dirstate.update(node) #3
          hook('pretxnclose')

"Changes at #3 above" was already discussed in "Failure in transaction scope".

Then, how should we treat #1 and #2 ?

1. restore dirstate at failure of command

   This policy isn't suitable for `hg commit --addremove` case,
   because dirstate changes at #1 above should be kept for backward
   compatibility, even if `hg commit` itself fails.

   At least, the way to avoid restoring dirstate at failure (a kind
   of the inverse of `dirstateguard` ?) is required for this policy.

   For example, commands below expect that dirstate changes are kept
   even if command itself is terminated with non-0 exit code
   (detailed flows are shown in subsequent section):

   - aborting by exception
     (`hg graft`, `hg rebase` and so on)

   - terminating normally with non-0 exit code
     (conflicts at `hg backout` without --merge)

2. restore dirstate at failure in wlock scope

   Changes at #2 (just `normal`-ing) can be discarded at failure in
   wlock scope.

   This policy is suitable at least for `hg commit` case.

   But some other commands expect that dirstate changes are kept
   even if command itself is aborted by exception, as described
   above.

   So, the way to avoid restoring dirstate at failure is required
   for this policy, too. In addition to it, code path enclosed by
   `repo.wlock()` + `wlock.release` may be nested at runtime, and
   this may increase complexity.

3. keep dirstate changes (and write them out) even at failure

   This is also suitable for `hg commit` case, because this is as
   same as current behavior :-)

Let's see other cases, too.

4.2. `merge.update()` users

There are some cases using `merge.update()` directly or indirectly
(histedit and so on, too)::

  commands.backout():
    with repo.wlock() as wlock:
      if linearmerging:
        # critical region >>>>
        merge.update('for merging linearly') #1
        repo.setparents() #2
        # critical region <<<<
  
        if conflicted:
          return 1 # expecting dirstate.write
        elif not commit:
          return 0 # expecting dirstate.write
      else:
        merge.update('to the revision to be backed out') #3
        cmduit.revert('to backout') #4
  
      repo.commit('the revision backing out') #5
  
       if merging:
         merge.update('to the original parent') #6
         merge.update('for merging with the revision backing out') #7


  commands.graft():
    with repo.wlock() as wlock:
      for rev in revs:
        merge.graft('to graft rev'):
          # critical region >>>>
          merge.update('for pseudo merging with graft source') #1
          repo.setparents() #2
          dirstate.copy('to duplicate copy information') #3
          # critical region <<<<
  
        if conflict:
          throw abort() # expecting dirstate.write
  
        repo.commit('the revision grafted') #4

(BTW, `merge.update()` isn't yet correctly included into "critical
region" of `hg graft`)

  rebase.rebase():
    with repo.wlock() as wlock:
      merge.update('to destination') #1
  
      for rev in revs:
        # critical region >>>>
        merge.update('for merging the revision to be rebased') #2
        dirstate.copy('to duplicate copy information') #3
        # critical region <<<<
  
        if conflict:
          raise InterventionRequired # expecting dirstate.write
  
        with dirstateguard('rebase'):
          repo.setparents('to drop 2nd parent of pseudo merge') #4
          repo.commit('the revision rebased') #5

(BTW, "critical region" of `hg rebase` isn't yet correctly guarded at
all)

On the other hand, `merge.update()` itself has many points which may
change dirstate and cause aborting at runtime::

  merge.update():
    with repo.wlock() as wlock:
      wctx.files() / wctx.dirty() #1
      hook('preupdate')
  
      calculateupdates()
  
      repo.wwrite('all files to be updated')
      wctx.sub(s).get('update recursively') for subs
      dirstate.change('all updated files') #2
  
      hook('postupdate')

Aborting in "critical region" above causes the issue that inconsistent
dirstate is written out.

There are some solutions to fix this issue:

1. restore dirstate at failure of command

2. restore dirstate at failure in wlock scope

3. enclose "critical region" by `dirstateguard` to restore dirstate
   at failure (but keep other dirstate changes even at failure)

According to `merge.update()` user cases, we can find something below
out:

a. `merge.update()` may be invoked multiple times in a wlock scope

   This also means that "critical region" may be also invoked
   multiple times in a wlock scope, and increases complexity of "a
   kind of the inverse of `dirstateguard` or so".

b. additional processing may be needed after `merge.update()` to use
   result of it correctly

   Result of each commands isn't valid, if a command is aborted in
   "critical region".

   For example, lack of `repo.setparents()` at #2 of `hg graft` may
   cause committing invalid revision at subsequent `hg graft
   --continue`.

   So, dirstate changes in "critical region" should be discarded at
   failure.

Then, enclosing "critical region" by `dirstateguard` seems easier than
"a kind of the inverse of `dirstateguard` or so" to restore dirstate
at failure.

4.3. Conclusion about failure in wlock scope

According to cases above, "restore dirstate at failure of command (or
in wlock scope)" policy doesn't seem suitable for current Mercurial
implementation.

Then, what about the policy below ?

1. keep dirstate changes outside transaction (and write them out)
   even at failure

2. enclose "critical region" by `dirstateguard` to discard dirstate
   changes at failure

4.4. Additional topics

4.4.1. Detection of aborting in critical region

(newly added at writing this page, as a memo)

`merge.update()` keeps `.hg/updatestate` while "critical region"
below::

  merge.update():
    with repo.wlock() as wlock:
      wctx.files() / wctx.dirty() #1
      hook('preupdate')
  
      calculateupdates()
  
      # critical region >>>>
      repo.wwrite('all files to be updated')
      wctx.sub(s).get('update recursively') for subs
      dirstate.change('all updated files') #2
      # critical region <<<<
  
      hook('postupdate')

This prevents `hg graft --continue` and so on from resuming on
inconsistent status, when previous `merge.update()` was aborted in
critical region of it.

But in fact, actual "critical region" of graft is wider than one of
`merege.update()`::

  commands.graft():
    with repo.wlock() as wlock:
      for rev in revs:
        merge.graft('to graft rev'):
          # critical region >>>>
          merge.update('for pseudo merging with graft source') #1
          repo.setparents() #2
          dirstate.copy('to duplicate copy information') #3
          # critical region <<<<
  
        if conflict:
          throw abort() # expecting dirstate.write
  
        repo.commit('the revision grafted') #4

For example, if previous `merge.update()` is aborted at
`hook('postupdate')`, `hg graft --continue` can resume grafting, even
though #2 and #3 in critical region of `hg graft` aren't executed in
this case.

How about them to prevent such situation ?

- introduce status information to indicate whether critical region is
  completed successfully or not (`graftmergingstate` or so)

- invoke `cmdutil.checkunfinished()` also at `hg graft --continue`

- extend `cmdutil.checkunfinished()` to omit own in-progress state

  for example, `cmdutil.checkunfinished(expected='graftstate')` at `hg
  graft --continue` is aborted by not `graftstate` but
  `graftmergingstate` (in addition to it, absence of `graftstate` may
  have to cause aborting)

backout (only `backoutmergingstate` or so), rebase, histedit and
tranplant should need these, too.

5. Detail about HG_PENDING mechanism

from:

See also TransactionPlan

This series is poking in the right direction, but it is failing to 
acknowledge the following list of principle and will not result in a 
working behavior.

1) '.pending' files are made necessary by the existence of a transaction,
2) HG_PENDING (and .pending files usage) must -only- occured in the 
   context of a transaction.
3) The transaction object is the one handle the '.pending' files and 
   associated non-pending file. This includes:

  3.1) The transaction must purge any existing '.pending' when it open/close
  3.2) generation of .pending files is done through the transaction
  3.3) writing of the final files is also done through the transaction

(*) We now have a developer warning feature, we should use it more.

Let me details this more.

5.1. Transaction make '.pending' necessary.

The goal of transaction is to prevent any external viewer to see its 
content before it is committed. So anything happening during the 
transaction is actually written to disk only at `tr.close()` time.

To lets hooks and tools see the content of the transaction, the 
'pending' mechanism is used. An environment variable let sub-invocation 
of hg know that it needs to looks for extra data in different file.

5.2. pending happen during transaction only

In the general case, we do not use the same 'pending' mechanism for 
dirstate related changes for the following reason:

- dirstate is related to the working copy. Most external viewer like 
  clone/pull/push do not care about the working copy so we are not at risk 
  of uncommited data escaping the repository.

- dirstate is related to the content of working copy (parents, but also 
  file content). We cannot have an atomic update of all the file in the 
  working copy so we'll have transient invalid state during the update 
  anyway, therefor we are not even trying to ensure we have an atomic 
  transaction for external viewer.

However, there is a case where we need dirstate to use a '.pending' 
mechanism: When the dirstate change makes it refer to content contained 
in a not-yet committed transaction.

- we do not want to make such dirstate visible until the transaction 
  content is visible.

- sub-invocation of mercurial in hook/tool need to be able to see this 
  content anyway.

To simplify things. I'm assuming::

   "dirstate refer to content in the transaction."

is the same as::

   "dirstate change happen during a transaction."


So, if your dirstate change happens inside a transaction, it should be 
collaborating with it. Your series have to make use of 
the`repo.currenttransaction()` method at least once somewhere.

5.3. The transaction object is the one handling file

To simplify entry points and ensure consistency, the transaction is 
handling all files involved in the transaction from end to end. So when 
dirstate collaborate with the transaction, it has to let the transaction 
handle the dirstate file


related to point 3.2: We are generating `pending` file on demand, 
because if no subprocess is to be call, we do not need to generated 
them. Every subcall that can need pending file will use call 
`tr.pending()` to trigger this generation. This is a single entry point 
that the code know and use. The function will also handle the case where 
there is no pending changes and not special logic is needed.

To get the transaction able to include your pending file, you have to 
register a "file generator" with `tr.addfilegenerator` the documentation 
is fairly extensive so I won't repeat it here. (if document is unclear 
we have to patch it).

I expect a call like::

   tr.addfilegenerator('dirstate', ('dirstate', self._write,
                       location='plain')

Where self._write is a method accepting an (already open) file object 
and writing dirstate content in it.

addfilegenerator can be called multiple time, each call replacing the other.

related to point 3.3: You have to let the transaction handle the final 
write too. The reason you have dirstate being handled by the transaction 
logic (if present) is that you never want to write the dirstate to disk 
(except pending) until the transaction data are visible too. And the 
dirstate/guard object have no way to know when this will happen. The 
transaction object is the one who know about that.

Lets looks at the following case::

   with tr:
       commit = newcommit()
       with ds:
            ds.setparent()
            ds.close() # confirm the dirstate change
       othercrap()
       tr.close() # commit the transaction

In this case the "ds.close()" is not the right time to write the file 
down, it is too early. It has to be written down at 'tr.close()' time.
By chance, the transaction will automatically take care of that since 
you used `addfilegenerator`.

summary: your .close() must not touch disk in the in-transaction case. 
Will handle it.

The final writing of the dirstate file will not be a simple move of the 
'.pending' file. This is necessary because the pending are lazily 
generated. Lets looks at the following scenario::

   with tr:
       commit = newcommit()
       with ds:
            ds.setparent()
            ds.close() # confirm the move
       tr.pending()
       with ds:
            ds.setparent()
            ds.close() # confirm the move

       tr.close() # commit the transaction

In this case, the .pending file is outdated compared to the in-memory 
state. So we need to regenerate the file anyway.

summary: transaction won't use simple rename.

related to point 3.1: You also have to be very careful of not leaving 
any '.pending' file around. Lets look at the following scenario::

  $ hg commit # write dirstate.pending, leave it behind
  $ hg up 42 # does not need to write a pending file
  $ hg pull # transaction call hook on pending change, dirstate is read
            # from .pending becaues it exists

So we have to make sure we do not leave the file behind. By chance, the 
transaction will take care of that for you. It keep track of all 
temporary file it generated and make sure they are cleaned up when the 
transaction is closed or rollbacked.

(We should probably aggressively nuke all known '.pending; type file at 
transaction open time but I'm not sure it is done yet.)

other nice thing: the transaction logic will also backup your file (and 
restore in rollback case) your dirstate file as soon as your register a 
generator for it.

5.4. Developer Warning:

To work your code (will) make multiple assumption:

- wlock is alway taken when we touch the dirstate,
- write are always done in a dirstate guard context,
- we do not do any direct writing if a transaction exists,

It is easy to check this is true, but we crashing in these case is 
suboptimal since it will affect human user of unsafe, but already 
existing code. By chance we now have a "developer warning concept" (that 
you get from `devel.all=true` config) and the default behavior for tests 
is now to run them. So adding code to detect and issue warning in this 
case will let the code base be cleaned up over time.


CategoryDeveloper CategoryNewFeatures

DirstateTransactionPlan (last edited 2017-08-21 21:11:31 by RodrigoDamazio)