[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu Jun 11 12:55:04 CDT 2015


At Wed, 10 Jun 2015 11:32:25 -0700,
Pierre-Yves David wrote:
> 
> 
> 
> On 06/10/2015 09:39 AM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, Jun 9, 2015 at 12:58 PM Pierre-Yves David
> > <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> > wrote:
> >
> >
> >
> >     On 06/03/2015 10:32 AM, FUJIWARA Katsunori wrote:
> >      >
> >      > At Tue, 02 Jun 2015 19:40:26 +0000,
> >      > Martin von Zweigbergk wrote:
> >      >
> >      >> On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori
> >     <foozy at lares.dti.ne.jp <mailto:foozy at lares.dti.ne.jp>>
> >      >> wrote:
> >      >
> >      >>> At Mon, 01 Jun 2015 18:57:16 +0000,
> >      >>> Martin von Zweigbergk wrote:
> >      >
> >      >>>> On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <
> >      >>> foozy at lares.dti.ne.jp <mailto:foozy at lares.dti.ne.jp>>
> >      >>>> wrote:
> >      >>>>
> >      >>>>> At Mon, 01 Jun 2015 17:02:10 +0000,
> >      >>>>> Martin von Zweigbergk wrote:
> >      >
> >      > (snip)
> >      >
> >      >>>>>    (2) execute `hg merge`, but it is aborted before "update
> >     dirstate"
> >      >>>>>        for example:
> >      >>>>>        - invalid updating in subrepositories
> >      >>>>>        - keyboard interruption by user, file I/O error and so on
> >      >>>>>> $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> >      >>>>>
> >      >>>>>    (3) (re-)set timestamp of 'b' on the filesystem for
> >     mis-leading
> >      >>>>>        this emulates that `hg merge` update 'b' before
> >      >>> 200001010000(00)+1sec
> >      >>>>>> $ touch -t 200001010000 b
> >      >>>>>
> >      >>>>
> >      >>>> Since 'hg status' above was run at 200001010000(00)+1sec (or
> >     later), how
> >      >>>> can this (interrupted) merge be run at 200001010000(00)?
> >      >>>
> >      >>> Oh, your wondering is natural. My explanation was wrong :-<
> >      >>>
> >      >>> In real world, steps below reproduce this issue:
> >      >>>
> >      >>
> >      >> Great explanation! Thanks!
> >      >>
> >      >> So the problem is that we mark the file clean in dirstate as a
> >     result of
> >      >> checking its status, then we update it on disk, abort the
> >     update, write the
> >      >> clean dirstate to disk and stay on the old revision. It seems a
> >     little
> >      >> strange to me that we write the dirstate even on failure. Do you
> >     know if
> >      >> there is a good reason for that?
> >      >
> >      > I don't know actual reason. It may be:
> >      >
> >      >    Invalidating in-memory changes at aborting has subsequent hg
> >      >    commands pay cost of "checking dirty-ness" again.
> >      >
> >      >    The more "checking dirty-ness" costs, the more and larger possibly
> >      >    dirty files there are.
> >      >
> >      > On the other hand, Pierre-Yves suggested me to discard all dirstate
> >      > changes inside transaction at failure in the article about making
> >      > in-memory dirstate changes visible to external process (in this case,
> >      > important point is "dirstate may refer rollback-ed revisions").
> >      >
> >      > So, I'll re-examine what we should treat in-memory dirstate
> >     changes at
> >      > failure widely (maybe after fixing this issue :-))
> >      >
> >      >
> >      >> Perhaps another option is to mark all files in the merge
> >     normallookup()
> >      >> before applyupdates()? That should mean that an abort later on
> >     would be
> >      >> safe, no?
> >      >
> >      > `dirstate.normallookup()` always discards all of mode, size and
> >      > timestamp. This forces subsequent status examinations to compare file
> >      > contents until it marked as clean, because lack of "mode and size"
> >      > marks such files as "unsure" until it is marked as "clean".
> >      >
> >      > This costs so much with large "possibly dirty" files.
> >      >
> >      >
> >      > BTW, I hit an idea of dirstate functionality to discard timestamp but
> >      > keep expected mode/size (current `normallookup` discards all of
> >     them).
> >      >
> >      > When at least one of mode/size of file is changed, this functionality
> >      > can avoid comparison of file content at subsequent status checking.
> >      >
> >      >
> >      >>>    == @200001010000(00)
> >      >>>
> >      >>>    (A) 'b' should be modified at this point, but
> >      >>>
> >      >>>    (B) it is assumed that dirstate timestamp of 'b' is -1 at
> >     this point
> >      >>>
> >      >>>    (C) invoke `hg merge`
> >      >>>
> >      >>>      (C-1) get wlock
> >      >>>
> >      >>>      (C-2) check uncommitted changes in `merge.update()` by
> >     `wc.files()`:
> >      >>>
> >      >>>            this indirectly sets dirstate timestamp of 'b' as
> >      >>> 200001010000(00)
> >      >>>
> >      >>>              wc.files() => wc._status() => repo.status() => ... =>
> >      >>>              wc._dirstatestatus() => wc._checklookup() =>
> >     dirstate.normal()
> >      >>>
> >      >>>
> >      >> Would an alternative fix be to write the dirstate here (and after
> >      >> recordupdates(), of course)? Would that be unacceptably slow,
> >     perhaps?
> >      >
> >      > It should be cheaper than merge/update itself.
> >      >
> >      > But we may have to put explicit `dirstate.write()` into many code
> >      > paths for safety. Today, we know that putting it before
> >      > `applyupdates()` avoids this issue. Then, tomorrow, we may have
> >     to put
> >      > it into other places to avoid similar issues.
> >      >
> >      > So, I want to find generalized fixing.
> >
> >     Wait, is this series directly related with the dirstate
> >     guard/transaction thing ? (as in, the next step we must do) or is it an
> >     unrelated fix to dirstate.
> >
> >
> > I think it is related, but I'll let Fujiwara confirm.
> >
> >     If it is unrelated, we should probably focus on fixing the dirstate
> >     write scheduling before working on this issue.
> >
> > And what do you recommend if it is related?
> 
> I've to admit I stopped following your discussion closely after roudn 
> trip. If this is on the critical path for dirstateguard, we should 
> probably start building a solid repro before taking care of it.

At first, I thought that this issue can be fixed fully separately from
fixing issue4378 or so.

But for total consistency of behavior at failure, we should make
consensus about handling dirstate changes generally, before working
more about issue4378. At least, deciding direction can avoid
meaningless side trip or so for me :-)

I'll post summary of issue4378 and this issue from the same point of
view as a basis for discussion.

 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list