[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Mon Jun 22 11:19:34 CDT 2015
At Fri, 12 Jun 2015 02:55:04 +0900,
FUJIWARA Katsunori wrote:
>
> At Wed, 10 Jun 2015 11:32:25 -0700,
> Pierre-Yves David wrote:
> >
> >
> >
> > On 06/10/2015 09:39 AM, Martin von Zweigbergk wrote:
> > >
> > >
> > > On Tue, Jun 9, 2015 at 12:58 PM Pierre-Yves David
> > > <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> > > wrote:
[snip]
> > > Wait, is this series directly related with the dirstate
> > > guard/transaction thing ? (as in, the next step we must do) or is it an
> > > unrelated fix to dirstate.
> > >
> > >
> > > I think it is related, but I'll let Fujiwara confirm.
> > >
> > > If it is unrelated, we should probably focus on fixing the dirstate
> > > write scheduling before working on this issue.
> > >
> > > And what do you recommend if it is related?
> >
> > I've to admit I stopped following your discussion closely after roudn
> > trip. If this is on the critical path for dirstateguard, we should
> > probably start building a solid repro before taking care of it.
>
> At first, I thought that this issue can be fixed fully separately from
> fixing issue4378 or so.
>
> But for total consistency of behavior at failure, we should make
> consensus about handling dirstate changes generally, before working
> more about issue4378. At least, deciding direction can avoid
> meaningless side trip or so for me :-)
>
> I'll post summary of issue4378 and this issue from the same point of
> view as a basis for discussion.
Sorry for late response.
I summarized about "dirstate changes at failure" as a basis for
discussion. Please comment !
BTW, it is assumed that `dirstate.normallookup` is invoked at
`repo.wwrite` or so, because, without these changes, problems like
issue4583 ("changed files are treated as clean") appear occasionally,
even if processing terminates successfully (e.g, at `hg revert`, `hg
import` and so on):
https://selenic.com/pipermail/mercurial-devel/2015-May/070384.html
https://selenic.com/pipermail/mercurial-devel/2015-May/070386.html
--------------
1. `hg commit`
--------------
Let's see typical case `hg commit`.
Lines ending with "#N" means "dirstate is changed here". Scope of
wlock/transaction is described in "with xxxx as yyyy:" style for
convenience.
================
commands.commit():
if opts['addremove']:
scmutil.addremove():
dirstate.update('files {added|removed|copied}') #1
# involving dirstate.write() at wlock.release()
repo.commit():
with repo.wlock() as wlock:
repo.status('to get current status') #2
editor('for commit message')
wctx.sub(s).commit('recursively') for subs
hook('precommit')
with repo.transaction('commit') as tr:
node = createcommit()
hook('pretxncommit')
dirstate.update(node) #3
hook('pretxnclose')
================
Changes at #3 above should be discarded for consistency of subsequent
`hg status` and so on, because:
- `dirstate.parents` refers rolled back (= not-existing) revision
- files are already `dirstate.normal`-ed
This is reason why (1) dirstate should be saved just before starting
transaction, and (2) restored if transaction is rolled back.
This was already discussed for "making in-memory dirstate changes
visible to external process" to fix issue4378.
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/80003/focus=80151
Then, how should we treat #1 and #2 ?
1. restore dirstate at failure of command
This policy isn't suitable for `hg commit --addremove` case,
because dirstate changes at #1 above should be kept for backward
compatibility, even if `hg commit` itself fails.
At least, the way to avoid restoring dirstate at failure (a kind
of the inverse of `dirstateguard` ?) is required for this policy.
For example, commands below expect that dirstate changes are kept
even if command itself is terminated with non-0 exit code
(detailed flows are shown in subsequent section):
- aborting by exception
`hg graft`, `hg rebase` and so on
- terminating normally with non-0 exit code
conflicts at `hg backout` without --merge
2. restore dirstate at failure in wlock scope
Changes at #2 (just `normal`-ing) can be discarded at failure in
wlock scope.
This policy is suitable at least for `hg commit` case.
But some other commands expect that dirstate changes are kept
even if command itself is aborted by exception, as described
above.
So, the way to avoid restoring dirstate at failure is required
for this policy, too. In addition to it, code path enclosed by
`repo.wlock()` + `wlock.release` may be nested at runtime, and
this may increase complexity.
3. keep dirstate changes (and write them out) even at failure
This is also suitable for `hg commit` case, because this is as
same as current behavior :-)
Let's see other cases, too.
--------------
2. `hg update`
--------------
================
commands.update():
merge.update():
with repo.wlock() as wlock:
wctx.files() / wctx.dirty() #1
hook('preupdate')
calculateupdates()
# critical region >>>>
repo.wwrite('all files to be updated')
wctx.sub(s).get('update recursively') for subs
dirstate.change('all updated files') #2
# critical region <<<<
hook('postupdate')
================
Changes at #1 above can be treated as similarly as ones at #2 in `hg
commit` case. So, let's ignore it here :-)
Aborting in "critical region" above causes the issue that changed
files are treated as clean, as I described in previous post.
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/80342/focus=80673
There are some solutions to fix this problem:
1. restore dirstate at failure of command
2. restore dirstate at failure in wlock scope
3. enclose "critical region" by `dirstateguard` to restore dirstate
at failure (but keep other dirstate changes even at failure)
4. apply `dirstate.normallookup` or so explicitly on files to be
updated before entering "critical region" (and keep dirstate
changes even at failure)
This is needed, even if `repo.wwrite` involves
`dirstate.normallookup`, because `repo.wwrite` itself may be
invoked in FORKED processes.
In fact, when `partial` argument of `merge.update` is specified
explicitly, (4) is needed, even if one of other than (4) is applied,
because updating dirstate (at #2 above) is omitted in such case (this
is used as a kind of "revert").
Then, (4) seems suitable for "critical region" in `hg update`.
--------------------------
3. Users of `merge.update`
--------------------------
There are some cases using `merge.update` directly or indirectly
(histedit and so on, too)
================
commands.backout():
with repo.wlock() as wlock:
if linearmerging:
# critical region >>>>
merge.update('for merging linearly') #1
repo.setparents() #2
# critical region <<<<
if conflicted:
return 1 # expecting dirstate.write
elif not commit:
return 0 # expecting dirstate.write
else:
merge.update('to the revision to be backed out') #3
cmduit.revert('to backout') #4
repo.commit('the revision backing out') #5
if merging:
merge.update('to the original parent') #6
merge.update('for merging with the revision backing out') #7
================
================
commands.graft():
with repo.wlock() as wlock:
for rev in revs:
merge.graft('to graft rev'):
# critical region >>>>
merge.update('for pseudo merging with graft source') #1
repo.setparents() #2
dirstate.copy('to duplicate copy information') #3
# critical region <<<<
if conflict:
throw abort() # expecting dirstate.write
repo.commit('the revision grafted') #4
================
(BTW, `merge.update` isn't yet included into "critical region" of
`hg graft`)
================
rebase.rebase():
with repo.wlock() as wlock:
merge.update('to destination') #1
for rev in revs:
# critical region >>>>
merge.update('for merging the revision to be rebased') #2
dirstate.copy('to duplicate copy information') #3
# critical region <<<<
if conflict:
raise InterventionRequired # expecting dirstate.write
with dirstateguard('rebase'):
repo.setparents('to drop 2nd parent of pseudo merge') #4
repo.commit('the revision rebased') #5
================
(BTW, "critical region" of `hg rebase` isn't yet guarded at all)
We can find something below out:
a. `merge.update` may be invoked multiple times in a wlock scope
This also means that "critical region" may be also invoked
multiple times in a wlock scope, and increases complexity of "a
kind of the inverse of `dirstateguard` or so".
b. additional processing may be needed after `merge.update` to use
result of it correctly
Result of each commands isn't valid, if a command is aborted in
"critical region". For example, lack of `repo.setparents()` at #2
of `hg graft` may cause committing invalid revision at subsequent
`hg graft --continue`.
So, dirstate changes in "critical region" should be discarded at
failure.
Then, enclosing "critical region" by `dirstateguard` seems easier than
"a kind of the inverse of `dirstateguard` or so" to restore dirstate
at failure.
-------------
4. Conclusion
-------------
According to cases above, "restore dirstate at failure of command (or
in wlock scope)" policy doesn't seem suitable for current Mercurial
implementation.
Then, what about the policy below ?
1. treat dirstate changes in transaction as "all or nothing"
- save dirstate just before starting transaction, and
- restore it if transaction is rolled back
2. keep dirstate changes outside transaction (and write them out)
even at failure
3. enclose "critical region" by `dirstateguard` to discard dirstate
changes at failure
4. but in `merge.update`, apply `dirstate.normallookup` explicitly
on files to be updated before entering into "critical region"
instead of enclosing it by `dirstateguard` because:
- `repo.wwrite` in FORKED processes doesn't write
`normallookup`-ed result into `.hg/dirstate`, and
- explicit `partial` argument of `merge.update` always omits
updating dirstate
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list