[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode
pierre-yves.david at ens-lyon.org
Sun May 24 16:16:21 CDT 2015
On 05/23/2015 03:08 PM, FUJIWARA Katsunori wrote:
> At Fri, 22 May 2015 11:57:50 -0500,
> Pierre-Yves David wrote:
>> On 05/22/2015 11:34 AM, FUJIWARA Katsunori wrote:
>>> At Thu, 21 May 2015 16:59:47 -0500,
>>> Pierre-Yves David wrote:
>>>> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
>>>>> # HG changeset patch
>>>>> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
>>>>> # Date 1432051569 -32400
>>>>> # Wed May 20 01:06:09 2015 +0900
>>>>> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
>>>>> # Parent 2d2eff9ee053025e8e0267b8d6c6d07326f607ee
>>>>> dirstate: read from `.pending` file under HG_PENDING mode
>>>>> True/False value of `_pendingmode` means whether `dirstate.pending` is
>>>>> used to initialize own `_map` and so on. When it is None, neither
>>>>> `dirstate` nor `dirstate.pending` is read in yet.
>>>>> This is used to keep consistent view between `_pl()` and `_read()`.
>>>>> Once `_pendingmode` is determined by reading one of `dirstate` or
>>>>> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
>>>>> is invoked. This should be reasonable, because:
>>>>> - effective `invalidate()` invocation should occur only in wlock scope, and
>>>>> - wlock can't be gotten under HG_PENDING mode
>>>>> For review-ability, this patch focuses only on reading `.pending` file
>>>>> in. Then, `dirstate.write()` can write changes out even under
>>>>> HG_PENDING mode. But it is still safe enough, because there is no code
>>>>> path writing `.pending` file out yet (= `_pendingmode` never be True).
>>>>> `_trypending()` is defined as a normal function to factor similar code
>>>>> path (in bookmarks and phases) out in the future easily.
>>>> This series is poking in the right direction, but it is failing to
>>>> acknowledge the following list of principle and will not result in a
>>>> working behavior.
>>>> 1) '.pending' files are made necessary by the existence of a transaction,
>>>> 2) HG_PENDING (and .pending files usage) must -only- occured in the
>>>> context of a transaction.
>>>> 3) The transaction object is the one handle the '.pending' files and
>>>> associated non-pending file. This includes:
>>>> 3.1) The transaction must purge any existing '.pending' when it open/close
>>>> 3.2) generation of .pending files is done through the transaction
>>>> 3.3) writing of the final files is also done through the transaction
>>>> *) We now have a developer warning feature, we should use it more.
>>>> Let me details this more.
>>> Thank you for detailed explanation !
>>> And, perhaps, should I post the patch below not as the first one of
>>> next series but as the last one of this series ?
>>> I put this into next series, because this seemed to increase
>>> review-ability of client code paths (= making in-memory changes
>>> visible to external process) of these functions in next series.
>>> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
>>> --- a/mercurial/dirstate.py
>>> +++ b/mercurial/dirstate.py
>>> @@ -632,6 +632,35 @@ class dirstate(object):
>>> + def delayupdate(self, tr):
>>> + '''Delay visibility of dirstate to other readers
>>> + '''
>>> + category = 'dirstate-%i' % id(self)
>>> + tr.addpending(category, self._writepending)
>>> + # pending file should be fixed up regardless of outcome of transaction
>>> + tr.addfinalize(category, self._fixuppending)
>>> + tr.addabort(category, self._fixuppending)
>>> + def makependingvisible(self, repo):
>>> + '''Make pending disrtate changes visible to external process
>>> + This is an utility to treat pending changes regardless of
>>> + transaction activity.
>>> + This returns the value to be set to 'HG_PENDING' environment
>>> + variable for external process.
>>> + '''
>>> + tr = repo.currenttransaction()
>>> + if tr:
>>> + self.delayupdate(tr)
>>> + if tr.writepending():
>>> + return repo.root # set 'HG_PENDING' only in this case
>>> + else:
>>> + self.write()
>>> + return ""
>>> def _removepending(self):
>>> This assumes that:
>>> - `dirstate.delayupdate()` will be used in certain transaction scope
>>> (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`
>>> - `dirstate.makependingvisible()` will be used in other code paths
>>> (e.g. for commit editor, `precommit` hook and so on), where
>>> transaction activity is determined only at runtime.
>>> I think that using these functions can satisfy rules you described
>>> above. How about this ?
>>> I chose `addpending()` instead of `addfilegenerator()` to write
>>> dirstate pending file out, because assumption below of the latter
>>> seems not suitable for dirstate characteristic:
>>> - changes will be written out at successful `transaction.close()`
>>> - pending file will be discarded at transaction failure
>>> As described in patch #4 of this series, dirstate changes should be
>>> written out (at releasing wlock) regardless of outcome of transaction
>>> in almost all cases.
>> This assumption are a surprise to me. Can you explain why we do not want
>> change to the dirstate that at "done withing a transaction" to be
>> rollbacked during that transaction.
>> Practical example will probably help to transfer your knowledge in my
>> direction :)
> I categorized cases below:
> 1. changing dirstate inside transaction:
> - qpush
> - amend
> - import
> now, dirstate changes inside transaction are explicitly
> discarded at failure by `dirstateguard`
> => OK, we can use `addfilegenerator()` in this case !
> - transplant
> - unshelve
> dirstate is always written out at releasing wlock, and:
> - for InterventionRequired or TransplantError, transaction is
> manually `close`-ed
> - for other exceptions, transaction is rollback-ed
> => Can we omit writing dirstate changes out at failure (= the
> latter above) in this case ?
> IMHO, probably "yes", because current dirstate parents
> already become "unknown" by rollback-ing changelog.
Okay! Sounds like:
- We agree on using addfilegenerator in these cases.
- This cases are defined by "dirstate change happen inside transaction.
What exactly define "dirstate change"? Call to write? opening a dirstate
guard? There is some stuff in (2) that makes me doubt of the second
point (how easy defined they are)
> 2. changing dirstate inside wlock, but outside (= before) transaction:
If the dirstate change happen -before- the transaction. Why is it now
written out before the transaction is open and why would we want to
rollback such change if it is still valid (since it point to stuff
outside the transaction?)
Are we writing the dirstate at wlock release only? And if so, why? Is it
a general principle as an attempt to prevent it to be visible while
pointing to in-transaction content (addressed but (1) then). Is it an
attempt to avoid issue related to transaction rollback? (also addressed
by (1) ?).
I feel like that if the dirstate is still valid after an operation
fails, it is the responsibility of the command logic to restore it to a
previous state if necessary.
If I missed any case where it make sense (in absolute, not as currently
implemented) to wait for wlock release before writing, please details.
> => We should keep changes regardless of outcome of transaction in
> this case for subsequent operations, like `--continue` or so
> (e.g. `repo.commit()` may fail for pretxncommit hook detecting
> violation of team specific rules)
> This means that changes held in `.pending` are still needed
> even if transaction is rollback-ed.
> - backout
> - graft
> - histedit
> - rebase (*1)
> (*1) Strictly speaking, `dirstateguard` saves changes before
> `repo.commit()`, so we can use `addfilegenerator()` for
> rebase. But let's forget this to make the problem simple :-)
I'm a bit confused here. These operation likely do multiple dirstate
Let's look at my expectation for graft:
for each revs:
dirstate.change('all new file and content') #1
raise InterventionRequired → exit
node = createcommit()
Is the first one really outside its a transaction or is graft wrapping
all whole iteration of the loop in its own transaction
The second one seems to fall in (case 1).
Let's not look at my expectation for rebase:
for each revs:
dirstate.change('all new file and content') #2
raise InterventionRequired → exit
node = createcommit()
The first one must obviously be preserved. It will be rolled back by `hg
rebase --abort` if needed.
#2 and #3 are similar to graft.
(at this point, you likely explain me that I know nothing about how
dirstate works :-) )
Writing down these two examples seems point at:
Sometimes there is a transaction open, and we touch the dirstate to
record changes to working copy files. This must not be rolled back with
Sometimes there is a transaction open and we update the dirstate to
point at new parents. But this parent are not "new stuff in the
transaction" (actually, the transaction is open, but nothing happened in
it yet). This must not be rolled bad with the transaction
Am I right? If I'm right, this probably mean case (1) is defined by
"dirstate parents point inside the transaction" which will probably be a
bit more tricky to get right.
> IMHO, we should mainly think about (2) case above.
> It is fact that invoking `dirstate.write()` explicitly before
> `repo.commit()` or so makes (2) case suitable for use of
> `addfilegenerator()`. In other words, "let's use clean dirstate inside
> transaction !".
I'm not sure what you mean here, but the previous round of question
probably cover that.
> But on the other hand, it is a little inefficient, because dirstate
> changes are written out at each continuous `repo.commit()` invocations
> (while graft, histedit and rebase), even if external process is never
What is the cost of writing this dirstate change? I feel like all the
other operation done by these command dominate the dirstate writing.
More information about the Mercurial-devel