[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode

Pierre-Yves David pierre-yves.david at ens-lyon.org
Sun May 24 16:16:21 CDT 2015

On 05/23/2015 03:08 PM, FUJIWARA Katsunori wrote:
> At Fri, 22 May 2015 11:57:50 -0500,
> Pierre-Yves David wrote:
>> On 05/22/2015 11:34 AM, FUJIWARA Katsunori wrote:
>>> At Thu, 21 May 2015 16:59:47 -0500,
>>> Pierre-Yves David wrote:
>>>> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
>>>>> # HG changeset patch
>>>>> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
>>>>> # Date 1432051569 -32400
>>>>> #      Wed May 20 01:06:09 2015 +0900
>>>>> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
>>>>> # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
>>>>> dirstate: read from `.pending` file under HG_PENDING mode
>>>>> True/False value of `_pendingmode` means whether `dirstate.pending` is
>>>>> used to initialize own `_map` and so on. When it is None, neither
>>>>> `dirstate` nor `dirstate.pending` is read in yet.
>>>>> This is used to keep consistent view between `_pl()` and `_read()`.
>>>>> Once `_pendingmode` is determined by reading one of `dirstate` or
>>>>> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
>>>>> is invoked. This should be reasonable, because:
>>>>> - effective `invalidate()` invocation should occur only in wlock scope, and
>>>>> - wlock can't be gotten under HG_PENDING mode
>>>>> For review-ability, this patch focuses only on reading `.pending` file
>>>>> in. Then, `dirstate.write()` can write changes out even under
>>>>> HG_PENDING mode. But it is still safe enough, because there is no code
>>>>> path writing `.pending` file out yet (= `_pendingmode` never be True).
>>>>> `_trypending()` is defined as a normal function to factor similar code
>>>>> path (in bookmarks and phases) out in the future easily.
>>>> This series is poking in the right direction, but it is failing to
>>>> acknowledge the following list of principle and will not result in a
>>>> working behavior.
>>>> 1) '.pending' files are made necessary by the existence of a transaction,
>>>> 2) HG_PENDING (and .pending files usage) must -only- occured in the
>>>> context of a transaction.
>>>> 3) The transaction object is the one handle the '.pending' files and
>>>> associated non-pending file. This includes:
>>>> 3.1) The transaction must purge any existing '.pending' when it open/close
>>>> 3.2) generation of .pending files is done through the transaction
>>>> 3.3) writing of the final files is also done through the transaction
>>>> *) We now have a developer warning feature, we should use it more.
>>>> Let me details this more.
>>> Thank you for detailed explanation !
>>> And, perhaps, should I post the patch below not as the first one of
>>> next series but as the last one of this series ?
>>> I put this into next series, because this seemed to increase
>>> review-ability of client code paths (= making in-memory changes
>>> visible to external process) of these functions in next series.
>>>       ====================
>>>       diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
>>>       --- a/mercurial/dirstate.py
>>>       +++ b/mercurial/dirstate.py
>>>       @@ -632,6 +632,35 @@ class dirstate(object):
>>>                    return
>>>                self._writedirstate(self._filename)
>>>       +    def delayupdate(self, tr):
>>>       +        '''Delay visibility of dirstate to other readers
>>>       +        '''
>>>       +        category = 'dirstate-%i' % id(self)
>>>       +
>>>       +        tr.addpending(category, self._writepending)
>>>       +
>>>       +        # pending file should be fixed up regardless of outcome of transaction
>>>       +        tr.addfinalize(category, self._fixuppending)
>>>       +        tr.addabort(category, self._fixuppending)
>>>       +
>>>       +    def makependingvisible(self, repo):
>>>       +        '''Make pending disrtate changes visible to external process
>>>       +
>>>       +        This is an utility to treat pending changes regardless of
>>>       +        transaction activity.
>>>       +
>>>       +        This returns the value to be set to 'HG_PENDING' environment
>>>       +        variable for external process.
>>>       +        '''
>>>       +        tr = repo.currenttransaction()
>>>       +        if tr:
>>>       +            self.delayupdate(tr)
>>>       +            if tr.writepending():
>>>       +                return repo.root # set 'HG_PENDING' only in this case
>>>       +        else:
>>>       +            self.write()
>>>       +        return ""
>>>       +
>>>            def _removepending(self):
>>>                try:
>>>                    self._opener.unlink(self._pendingfilename)
>>>       ====================
>>> This assumes that:
>>>     - `dirstate.delayupdate()` will be used in certain transaction scope
>>>       (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`
>>>     - `dirstate.makependingvisible()` will be used in other code paths
>>>       (e.g. for commit editor, `precommit` hook and so on), where
>>>       transaction activity is determined only at runtime.
>>> I think that using these functions can satisfy rules you described
>>> above. How about this ?
>>> I chose `addpending()` instead of `addfilegenerator()` to write
>>> dirstate pending file out, because assumption below of the latter
>>> seems not suitable for dirstate characteristic:
>>>     - changes will be written out at successful `transaction.close()`
>>>     - pending file will be discarded at transaction failure
>>> As described in patch #4 of this series, dirstate changes should be
>>> written out (at releasing wlock) regardless of outcome of transaction
>>> in almost all cases.
>> This assumption are a surprise to me. Can you explain why we do not want
>> change to the dirstate that at "done withing a transaction" to be
>> rollbacked during that transaction.
>> Practical example will probably help to transfer your knowledge in my
>> direction :)
> I categorized cases below:
>   1. changing dirstate inside transaction:
>      - qpush
>      - amend
>      - import
>        now, dirstate changes inside transaction are explicitly
>        discarded at failure by `dirstateguard`
>        => OK, we can use `addfilegenerator()` in this case !
>      - transplant
>      - unshelve
>        dirstate is always written out at releasing wlock, and:
>        - for InterventionRequired or TransplantError, transaction is
>          manually `close`-ed
>        - for other exceptions, transaction is rollback-ed
>        => Can we omit writing dirstate changes out at failure (= the
>           latter above) in this case ?
>           IMHO, probably "yes", because current dirstate parents
>           already become "unknown" by rollback-ing changelog.

Okay! Sounds like:

- We agree on using addfilegenerator in these cases.
- This cases are defined by "dirstate change happen inside transaction.

What exactly define "dirstate change"? Call to write? opening a dirstate 
guard? There is some stuff in (2) that makes me doubt of the second 
point (how easy defined they are)

>   2. changing dirstate inside wlock, but outside (= before) transaction:

If the dirstate change happen -before- the transaction. Why is it now 
written out before the transaction is open and why would we want to 
rollback such change if it is still valid (since it point to stuff 
outside the transaction?)

Are we writing the dirstate at wlock release only? And if so, why? Is it 
a general principle as an attempt to prevent it to be visible while 
pointing to in-transaction content (addressed but (1) then). Is it an 
attempt to avoid issue related to transaction rollback? (also addressed 
by (1) ?).

I feel like that if the dirstate is still valid after an operation 
fails, it is the responsibility of the command logic to restore it to a 
previous state if necessary.

If I missed any case where it make sense (in absolute, not as currently 
implemented) to wait for wlock release before writing, please details.

>      => We should keep changes regardless of outcome of transaction in
>         this case for subsequent operations, like `--continue` or so
>         (e.g. `repo.commit()` may fail for pretxncommit hook detecting
>         violation of team specific rules)
>         This means that changes held in `.pending` are still needed
>         even if transaction is rollback-ed.
>      - backout
>      - graft
>      - histedit
>      - rebase (*1)
>        (*1) Strictly speaking, `dirstateguard` saves changes before
>             `repo.commit()`, so we can use `addfilegenerator()` for
>             rebase. But let's forget this to make the problem simple :-)

I'm a bit confused here. These operation likely do multiple dirstate 

Let's look at my expectation for graft:

     for each revs:
         dirstate.change('all new file and content') #1
         if conflict:
           raise InterventionRequired → exit
           node = createcommit()
           dirstate.update(node) #2

Is the first one really outside its a transaction or is graft wrapping 
all whole iteration of the loop in its own transaction
The second one seems to fall in (case 1).

Let's not look at my expectation for rebase:

       dirstate.update(destination) #1
     for each revs:
         dirstate.change('all new file and content') #2
         if conflict:
           raise InterventionRequired → exit
           node = createcommit()
           dirstate.update(node) #3

The first one must obviously be preserved. It will be rolled back by `hg 
rebase --abort` if needed.
#2 and #3 are similar to graft.

(at this point, you likely explain me that I know nothing about how 
dirstate works :-) )

Writing down these two examples seems point at:

   Sometimes there is a transaction open, and we touch the dirstate to 
record changes to working copy files. This must not be rolled back with 
the transaction

   Sometimes there is a transaction open and we update the dirstate to 
point at new parents. But this parent are not "new stuff in the 
transaction" (actually, the transaction is open, but nothing happened in 
it yet). This must not be rolled bad with the transaction

Am I right? If I'm right, this probably mean case (1) is defined by 
"dirstate parents point inside the transaction" which will probably be a 
bit more tricky to get right.

> IMHO, we should mainly think about (2) case above.
> It is fact that invoking `dirstate.write()` explicitly before
> `repo.commit()` or so makes (2) case suitable for use of
> `addfilegenerator()`. In other words, "let's use clean dirstate inside
> transaction !".

I'm not sure what you mean here, but the previous round of question 
probably cover that.

> But on the other hand, it is a little inefficient, because dirstate
> changes are written out at each continuous `repo.commit()` invocations
> (while graft, histedit and rebase), even if external process is never
> invoked.

What is the cost of writing this dirstate change? I feel like all the 
other operation done by these command dominate the dirstate writing.

Pierre-Yves David

More information about the Mercurial-devel mailing list