[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Sat May 23 15:08:04 CDT 2015


At Fri, 22 May 2015 11:57:50 -0500,
Pierre-Yves David wrote:
> 
> On 05/22/2015 11:34 AM, FUJIWARA Katsunori wrote:
> >
> > At Thu, 21 May 2015 16:59:47 -0500,
> > Pierre-Yves David wrote:
> >>
> >> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> >>> # HG changeset patch
> >>> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> >>> # Date 1432051569 -32400
> >>> #      Wed May 20 01:06:09 2015 +0900
> >>> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> >>> # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> >>> dirstate: read from `.pending` file under HG_PENDING mode
> >>>
> >>> True/False value of `_pendingmode` means whether `dirstate.pending` is
> >>> used to initialize own `_map` and so on. When it is None, neither
> >>> `dirstate` nor `dirstate.pending` is read in yet.
> >>>
> >>> This is used to keep consistent view between `_pl()` and `_read()`.
> >>>
> >>> Once `_pendingmode` is determined by reading one of `dirstate` or
> >>> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> >>> is invoked. This should be reasonable, because:
> >>>
> >>> - effective `invalidate()` invocation should occur only in wlock scope, and
> >>> - wlock can't be gotten under HG_PENDING mode
> >>>
> >>> For review-ability, this patch focuses only on reading `.pending` file
> >>> in. Then, `dirstate.write()` can write changes out even under
> >>> HG_PENDING mode. But it is still safe enough, because there is no code
> >>> path writing `.pending` file out yet (= `_pendingmode` never be True).
> >>>
> >>> `_trypending()` is defined as a normal function to factor similar code
> >>> path (in bookmarks and phases) out in the future easily.
> >>
> >> This series is poking in the right direction, but it is failing to
> >> acknowledge the following list of principle and will not result in a
> >> working behavior.
> >>
> >> 1) '.pending' files are made necessary by the existence of a transaction,
> >> 2) HG_PENDING (and .pending files usage) must -only- occured in the
> >> context of a transaction.
> >> 3) The transaction object is the one handle the '.pending' files and
> >> associated non-pending file. This includes:
> >>
> >> 3.1) The transaction must purge any existing '.pending' when it open/close
> >> 3.2) generation of .pending files is done through the transaction
> >> 3.3) writing of the final files is also done through the transaction
> >>
> >> *) We now have a developer warning feature, we should use it more.
> >>
> >> Let me details this more.
> >
> > Thank you for detailed explanation !
> >
> > And, perhaps, should I post the patch below not as the first one of
> > next series but as the last one of this series ?
> >
> > I put this into next series, because this seemed to increase
> > review-ability of client code paths (= making in-memory changes
> > visible to external process) of these functions in next series.
> >
> >      ====================
> >      diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> >      --- a/mercurial/dirstate.py
> >      +++ b/mercurial/dirstate.py
> >      @@ -632,6 +632,35 @@ class dirstate(object):
> >                   return
> >               self._writedirstate(self._filename)
> >
> >      +    def delayupdate(self, tr):
> >      +        '''Delay visibility of dirstate to other readers
> >      +        '''
> >      +        category = 'dirstate-%i' % id(self)
> >      +
> >      +        tr.addpending(category, self._writepending)
> >      +
> >      +        # pending file should be fixed up regardless of outcome of transaction
> >      +        tr.addfinalize(category, self._fixuppending)
> >      +        tr.addabort(category, self._fixuppending)
> >      +
> >      +    def makependingvisible(self, repo):
> >      +        '''Make pending disrtate changes visible to external process
> >      +
> >      +        This is an utility to treat pending changes regardless of
> >      +        transaction activity.
> >      +
> >      +        This returns the value to be set to 'HG_PENDING' environment
> >      +        variable for external process.
> >      +        '''
> >      +        tr = repo.currenttransaction()
> >      +        if tr:
> >      +            self.delayupdate(tr)
> >      +            if tr.writepending():
> >      +                return repo.root # set 'HG_PENDING' only in this case
> >      +        else:
> >      +            self.write()
> >      +        return ""
> >      +
> >           def _removepending(self):
> >               try:
> >                   self._opener.unlink(self._pendingfilename)
> >      ====================
> >
> > This assumes that:
> >
> >    - `dirstate.delayupdate()` will be used in certain transaction scope
> >      (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`
> >
> >    - `dirstate.makependingvisible()` will be used in other code paths
> >      (e.g. for commit editor, `precommit` hook and so on), where
> >      transaction activity is determined only at runtime.
> >
> > I think that using these functions can satisfy rules you described
> > above. How about this ?
> >
> >
> > I chose `addpending()` instead of `addfilegenerator()` to write
> > dirstate pending file out, because assumption below of the latter
> > seems not suitable for dirstate characteristic:
> >
> >    - changes will be written out at successful `transaction.close()`
> >    - pending file will be discarded at transaction failure
> >
> > As described in patch #4 of this series, dirstate changes should be
> > written out (at releasing wlock) regardless of outcome of transaction
> > in almost all cases.
> 
> This assumption are a surprise to me. Can you explain why we do not want 
> change to the dirstate that at "done withing a transaction" to be 
> rollbacked during that transaction.
> 
> Practical example will probably help to transfer your knowledge in my 
> direction :)

I categorized cases below:

 1. changing dirstate inside transaction:

    - qpush
    - amend
    - import
      now, dirstate changes inside transaction are explicitly
      discarded at failure by `dirstateguard`

      => OK, we can use `addfilegenerator()` in this case !

    - transplant
    - unshelve
      dirstate is always written out at releasing wlock, and:

      - for InterventionRequired or TransplantError, transaction is
        manually `close`-ed
      - for other exceptions, transaction is rollback-ed

      => Can we omit writing dirstate changes out at failure (= the
         latter above) in this case ?

         IMHO, probably "yes", because current dirstate parents
         already become "unknown" by rollback-ing changelog.

 2. changing dirstate inside wlock, but outside (= before) transaction:

    => We should keep changes regardless of outcome of transaction in
       this case for subsequent operations, like `--continue` or so

       (e.g. `repo.commit()` may fail for pretxncommit hook detecting
       violation of team specific rules)

       This means that changes held in `.pending` are still needed
       even if transaction is rollback-ed.

    - backout
    - graft
    - histedit
    - rebase (*1)

      (*1) Strictly speaking, `dirstateguard` saves changes before
           `repo.commit()`, so we can use `addfilegenerator()` for
           rebase. But let's forget this to make the problem simple :-)

IMHO, we should mainly think about (2) case above.

It is fact that invoking `dirstate.write()` explicitly before
`repo.commit()` or so makes (2) case suitable for use of
`addfilegenerator()`. In other words, "let's use clean dirstate inside
transaction !".

But on the other hand, it is a little inefficient, because dirstate
changes are written out at each continuous `repo.commit()` invocations
(while graft, histedit and rebase), even if external process is never
invoked.


> >> Developer Warning:
> >> -----------------
> >>
> >> To work your code (will) make multiple assumption:
> >>
> > [snip]
> >> - write are always done in a dirstate guard context,
> >
> > Even though `dirstateguard` is typical `dirstate.write()` caller, I
> > don't assume above. But are there something to indicate or suggest so ?
> > I may overllook them :-<
> 
> I'm far less familiar with this topic than you. But I cannot think of a 
> case were we would want to update the dirstate content without some sort 
> of transactional protect that guard is offering.
> 
> Do you have an example ?

AFAIK, there are still some explicit `dirstate.write()` invocations
outside transaction/dirstateguard scope:

  - in `rebase.rebasenode()`
  - in `strip.strip()` and `strip.stripcmd()`

    These seem to try making in-memory changes visible to external
    process spawned at subsequent `merge.update()` (or, save
    intermediate changes for safety ?).

    IMHO, these can be removed after making in-memory dirstate changes
    visible :-)

  - in `merge.graft()` (used by graft and histedit)

    This wants to ensure dropping the second merge parent, even if
    subsequent process is aborted.

    This is needed, if `copies.duplicatecopies()` may fail in some
    cases (or we may have to replace it by `dirstateguard` or so, to
    deprecate explicit `dirstate.write` invocation)


> -- 
> Pierre-Yves David
> 


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list