[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety

Martin von Zweigbergk martinvonz at google.com
Wed Jun 10 11:39:25 CDT 2015


On Tue, Jun 9, 2015 at 12:58 PM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 06/03/2015 10:32 AM, FUJIWARA Katsunori wrote:
> >
> > At Tue, 02 Jun 2015 19:40:26 +0000,
> > Martin von Zweigbergk wrote:
> >
> >> On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori <
> foozy at lares.dti.ne.jp>
> >> wrote:
> >
> >>> At Mon, 01 Jun 2015 18:57:16 +0000,
> >>> Martin von Zweigbergk wrote:
> >
> >>>> On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <
> >>> foozy at lares.dti.ne.jp>
> >>>> wrote:
> >>>>
> >>>>> At Mon, 01 Jun 2015 17:02:10 +0000,
> >>>>> Martin von Zweigbergk wrote:
> >
> > (snip)
> >
> >>>>>    (2) execute `hg merge`, but it is aborted before "update dirstate"
> >>>>>        for example:
> >>>>>        - invalid updating in subrepositories
> >>>>>        - keyboard interruption by user, file I/O error and so on
> >>>>>> $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> >>>>>
> >>>>>    (3) (re-)set timestamp of 'b' on the filesystem for mis-leading
> >>>>>        this emulates that `hg merge` update 'b' before
> >>> 200001010000(00)+1sec
> >>>>>> $ touch -t 200001010000 b
> >>>>>
> >>>>
> >>>> Since 'hg status' above was run at 200001010000(00)+1sec (or later),
> how
> >>>> can this (interrupted) merge be run at 200001010000(00)?
> >>>
> >>> Oh, your wondering is natural. My explanation was wrong :-<
> >>>
> >>> In real world, steps below reproduce this issue:
> >>>
> >>
> >> Great explanation! Thanks!
> >>
> >> So the problem is that we mark the file clean in dirstate as a result of
> >> checking its status, then we update it on disk, abort the update, write
> the
> >> clean dirstate to disk and stay on the old revision. It seems a little
> >> strange to me that we write the dirstate even on failure. Do you know if
> >> there is a good reason for that?
> >
> > I don't know actual reason. It may be:
> >
> >    Invalidating in-memory changes at aborting has subsequent hg
> >    commands pay cost of "checking dirty-ness" again.
> >
> >    The more "checking dirty-ness" costs, the more and larger possibly
> >    dirty files there are.
> >
> > On the other hand, Pierre-Yves suggested me to discard all dirstate
> > changes inside transaction at failure in the article about making
> > in-memory dirstate changes visible to external process (in this case,
> > important point is "dirstate may refer rollback-ed revisions").
> >
> > So, I'll re-examine what we should treat in-memory dirstate changes at
> > failure widely (maybe after fixing this issue :-))
> >
> >
> >> Perhaps another option is to mark all files in the merge normallookup()
> >> before applyupdates()? That should mean that an abort later on would be
> >> safe, no?
> >
> > `dirstate.normallookup()` always discards all of mode, size and
> > timestamp. This forces subsequent status examinations to compare file
> > contents until it marked as clean, because lack of "mode and size"
> > marks such files as "unsure" until it is marked as "clean".
> >
> > This costs so much with large "possibly dirty" files.
> >
> >
> > BTW, I hit an idea of dirstate functionality to discard timestamp but
> > keep expected mode/size (current `normallookup` discards all of them).
> >
> > When at least one of mode/size of file is changed, this functionality
> > can avoid comparison of file content at subsequent status checking.
> >
> >
> >>>    == @200001010000(00)
> >>>
> >>>    (A) 'b' should be modified at this point, but
> >>>
> >>>    (B) it is assumed that dirstate timestamp of 'b' is -1 at this point
> >>>
> >>>    (C) invoke `hg merge`
> >>>
> >>>      (C-1) get wlock
> >>>
> >>>      (C-2) check uncommitted changes in `merge.update()` by
> `wc.files()`:
> >>>
> >>>            this indirectly sets dirstate timestamp of 'b' as
> >>> 200001010000(00)
> >>>
> >>>              wc.files() => wc._status() => repo.status() => ... =>
> >>>              wc._dirstatestatus() => wc._checklookup() =>
> dirstate.normal()
> >>>
> >>>
> >> Would an alternative fix be to write the dirstate here (and after
> >> recordupdates(), of course)? Would that be unacceptably slow, perhaps?
> >
> > It should be cheaper than merge/update itself.
> >
> > But we may have to put explicit `dirstate.write()` into many code
> > paths for safety. Today, we know that putting it before
> > `applyupdates()` avoids this issue. Then, tomorrow, we may have to put
> > it into other places to avoid similar issues.
> >
> > So, I want to find generalized fixing.
>
> Wait, is this series directly related with the dirstate
> guard/transaction thing ? (as in, the next step we must do) or is it an
> unrelated fix to dirstate.
>

I think it is related, but I'll let Fujiwara confirm.


> If it is unrelated, we should probably focus on fixing the dirstate
> write scheduling before working on this issue.
>

And what do you recommend if it is related?


>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150610/601f56a6/attachment.html>


More information about the Mercurial-devel mailing list