[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety

Pierre-Yves David pierre-yves.david at ens-lyon.org
Wed Jun 10 13:32:25 CDT 2015



On 06/10/2015 09:39 AM, Martin von Zweigbergk wrote:
>
>
> On Tue, Jun 9, 2015 at 12:58 PM Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>
>
>
>     On 06/03/2015 10:32 AM, FUJIWARA Katsunori wrote:
>      >
>      > At Tue, 02 Jun 2015 19:40:26 +0000,
>      > Martin von Zweigbergk wrote:
>      >
>      >> On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori
>     <foozy at lares.dti.ne.jp <mailto:foozy at lares.dti.ne.jp>>
>      >> wrote:
>      >
>      >>> At Mon, 01 Jun 2015 18:57:16 +0000,
>      >>> Martin von Zweigbergk wrote:
>      >
>      >>>> On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <
>      >>> foozy at lares.dti.ne.jp <mailto:foozy at lares.dti.ne.jp>>
>      >>>> wrote:
>      >>>>
>      >>>>> At Mon, 01 Jun 2015 17:02:10 +0000,
>      >>>>> Martin von Zweigbergk wrote:
>      >
>      > (snip)
>      >
>      >>>>>    (2) execute `hg merge`, but it is aborted before "update
>     dirstate"
>      >>>>>        for example:
>      >>>>>        - invalid updating in subrepositories
>      >>>>>        - keyboard interruption by user, file I/O error and so on
>      >>>>>> $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
>      >>>>>
>      >>>>>    (3) (re-)set timestamp of 'b' on the filesystem for
>     mis-leading
>      >>>>>        this emulates that `hg merge` update 'b' before
>      >>> 200001010000(00)+1sec
>      >>>>>> $ touch -t 200001010000 b
>      >>>>>
>      >>>>
>      >>>> Since 'hg status' above was run at 200001010000(00)+1sec (or
>     later), how
>      >>>> can this (interrupted) merge be run at 200001010000(00)?
>      >>>
>      >>> Oh, your wondering is natural. My explanation was wrong :-<
>      >>>
>      >>> In real world, steps below reproduce this issue:
>      >>>
>      >>
>      >> Great explanation! Thanks!
>      >>
>      >> So the problem is that we mark the file clean in dirstate as a
>     result of
>      >> checking its status, then we update it on disk, abort the
>     update, write the
>      >> clean dirstate to disk and stay on the old revision. It seems a
>     little
>      >> strange to me that we write the dirstate even on failure. Do you
>     know if
>      >> there is a good reason for that?
>      >
>      > I don't know actual reason. It may be:
>      >
>      >    Invalidating in-memory changes at aborting has subsequent hg
>      >    commands pay cost of "checking dirty-ness" again.
>      >
>      >    The more "checking dirty-ness" costs, the more and larger possibly
>      >    dirty files there are.
>      >
>      > On the other hand, Pierre-Yves suggested me to discard all dirstate
>      > changes inside transaction at failure in the article about making
>      > in-memory dirstate changes visible to external process (in this case,
>      > important point is "dirstate may refer rollback-ed revisions").
>      >
>      > So, I'll re-examine what we should treat in-memory dirstate
>     changes at
>      > failure widely (maybe after fixing this issue :-))
>      >
>      >
>      >> Perhaps another option is to mark all files in the merge
>     normallookup()
>      >> before applyupdates()? That should mean that an abort later on
>     would be
>      >> safe, no?
>      >
>      > `dirstate.normallookup()` always discards all of mode, size and
>      > timestamp. This forces subsequent status examinations to compare file
>      > contents until it marked as clean, because lack of "mode and size"
>      > marks such files as "unsure" until it is marked as "clean".
>      >
>      > This costs so much with large "possibly dirty" files.
>      >
>      >
>      > BTW, I hit an idea of dirstate functionality to discard timestamp but
>      > keep expected mode/size (current `normallookup` discards all of
>     them).
>      >
>      > When at least one of mode/size of file is changed, this functionality
>      > can avoid comparison of file content at subsequent status checking.
>      >
>      >
>      >>>    == @200001010000(00)
>      >>>
>      >>>    (A) 'b' should be modified at this point, but
>      >>>
>      >>>    (B) it is assumed that dirstate timestamp of 'b' is -1 at
>     this point
>      >>>
>      >>>    (C) invoke `hg merge`
>      >>>
>      >>>      (C-1) get wlock
>      >>>
>      >>>      (C-2) check uncommitted changes in `merge.update()` by
>     `wc.files()`:
>      >>>
>      >>>            this indirectly sets dirstate timestamp of 'b' as
>      >>> 200001010000(00)
>      >>>
>      >>>              wc.files() => wc._status() => repo.status() => ... =>
>      >>>              wc._dirstatestatus() => wc._checklookup() =>
>     dirstate.normal()
>      >>>
>      >>>
>      >> Would an alternative fix be to write the dirstate here (and after
>      >> recordupdates(), of course)? Would that be unacceptably slow,
>     perhaps?
>      >
>      > It should be cheaper than merge/update itself.
>      >
>      > But we may have to put explicit `dirstate.write()` into many code
>      > paths for safety. Today, we know that putting it before
>      > `applyupdates()` avoids this issue. Then, tomorrow, we may have
>     to put
>      > it into other places to avoid similar issues.
>      >
>      > So, I want to find generalized fixing.
>
>     Wait, is this series directly related with the dirstate
>     guard/transaction thing ? (as in, the next step we must do) or is it an
>     unrelated fix to dirstate.
>
>
> I think it is related, but I'll let Fujiwara confirm.
>
>
>     If it is unrelated, we should probably focus on fixing the dirstate
>     write scheduling before working on this issue.
>
>
> And what do you recommend if it is related?

I've to admit I stopped following your discussion closely after roudn 
trip. If this is on the critical path for dirstateguard, we should 
probably start building a solid repro before taking care of it.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list