[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety

Tue Jun 9 14:57:58 CDT 2015

On 06/03/2015 10:32 AM, FUJIWARA Katsunori wrote:
>
> At Tue, 02 Jun 2015 19:40:26 +0000,
> Martin von Zweigbergk wrote:
>
>> On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
>> wrote:
>
>>> At Mon, 01 Jun 2015 18:57:16 +0000,
>>> Martin von Zweigbergk wrote:
>
>>>> On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <
>>> foozy at lares.dti.ne.jp>
>>>> wrote:
>>>>
>>>>> At Mon, 01 Jun 2015 17:02:10 +0000,
>>>>> Martin von Zweigbergk wrote:
>
> (snip)
>
>>>>>    (2) execute `hg merge`, but it is aborted before "update dirstate"
>>>>>        for example:
>>>>>        - invalid updating in subrepositories
>>>>>        - keyboard interruption by user, file I/O error and so on
>>>>>> $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
>>>>>
>>>>>    (3) (re-)set timestamp of 'b' on the filesystem for mis-leading
>>>>>        this emulates that `hg merge` update 'b' before
>>> 200001010000(00)+1sec
>>>>>> $ touch -t 200001010000 b
>>>>>
>>>>
>>>> Since 'hg status' above was run at 200001010000(00)+1sec (or later), how
>>>> can this (interrupted) merge be run at 200001010000(00)?
>>>
>>> Oh, your wondering is natural. My explanation was wrong :-<
>>>
>>> In real world, steps below reproduce this issue:
>>>
>>
>> Great explanation! Thanks!
>>
>> So the problem is that we mark the file clean in dirstate as a result of
>> checking its status, then we update it on disk, abort the update, write the
>> clean dirstate to disk and stay on the old revision. It seems a little
>> strange to me that we write the dirstate even on failure. Do you know if
>> there is a good reason for that?
>
> I don't know actual reason. It may be:
>
>    Invalidating in-memory changes at aborting has subsequent hg
>    commands pay cost of "checking dirty-ness" again.
>
>    The more "checking dirty-ness" costs, the more and larger possibly
>    dirty files there are.
>
> On the other hand, Pierre-Yves suggested me to discard all dirstate
> changes inside transaction at failure in the article about making
> in-memory dirstate changes visible to external process (in this case,
> important point is "dirstate may refer rollback-ed revisions").
>
> So, I'll re-examine what we should treat in-memory dirstate changes at
> failure widely (maybe after fixing this issue :-))
>
>
>> Perhaps another option is to mark all files in the merge normallookup()
>> before applyupdates()? That should mean that an abort later on would be
>> safe, no?
>
> `dirstate.normallookup()` always discards all of mode, size and
> timestamp. This forces subsequent status examinations to compare file
> contents until it marked as clean, because lack of "mode and size"
> marks such files as "unsure" until it is marked as "clean".
>
> This costs so much with large "possibly dirty" files.
>
>
> BTW, I hit an idea of dirstate functionality to discard timestamp but
> keep expected mode/size (current `normallookup` discards all of them).
>
> When at least one of mode/size of file is changed, this functionality
> can avoid comparison of file content at subsequent status checking.
>
>
>>>    == @200001010000(00)
>>>
>>>    (A) 'b' should be modified at this point, but
>>>
>>>    (B) it is assumed that dirstate timestamp of 'b' is -1 at this point
>>>
>>>    (C) invoke `hg merge`
>>>
>>>      (C-1) get wlock
>>>
>>>      (C-2) check uncommitted changes in `merge.update()` by `wc.files()`:
>>>
>>>            this indirectly sets dirstate timestamp of 'b' as
>>> 200001010000(00)
>>>
>>>              wc.files() => wc._status() => repo.status() => ... =>
>>>              wc._dirstatestatus() => wc._checklookup() => dirstate.normal()
>>>
>>>
>> Would an alternative fix be to write the dirstate here (and after
>> recordupdates(), of course)? Would that be unacceptably slow, perhaps?
>
> It should be cheaper than merge/update itself.
>
> But we may have to put explicit `dirstate.write()` into many code
> paths for safety. Today, we know that putting it before
> `applyupdates()` avoids this issue. Then, tomorrow, we may have to put
> it into other places to avoid similar issues.
>
> So, I want to find generalized fixing.

Wait, is this series directly related with the dirstate 
guard/transaction thing ? (as in, the next step we must do) or is it an 
unrelated fix to dirstate.

If it is unrelated, we should probably focus on fixing the dirstate 
write scheduling before working on this issue.

-- 
Pierre-Yves David