[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Wed Jun 3 12:32:31 CDT 2015


At Tue, 02 Jun 2015 19:40:26 +0000,
Martin von Zweigbergk wrote:

> On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> wrote:

> > At Mon, 01 Jun 2015 18:57:16 +0000,
> > Martin von Zweigbergk wrote:

> > > On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <
> > foozy at lares.dti.ne.jp>
> > > wrote:
> > >
> > > > At Mon, 01 Jun 2015 17:02:10 +0000,
> > > > Martin von Zweigbergk wrote:

(snip)

> > > >   (2) execute `hg merge`, but it is aborted before "update dirstate"
> > > >       for example:
> > > >       - invalid updating in subrepositories
> > > >       - keyboard interruption by user, file I/O error and so on
> > > > > $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> > > >
> > > >   (3) (re-)set timestamp of 'b' on the filesystem for mis-leading
> > > >       this emulates that `hg merge` update 'b' before
> > 200001010000(00)+1sec
> > > > > $ touch -t 200001010000 b
> > > >
> > >
> > > Since 'hg status' above was run at 200001010000(00)+1sec (or later), how
> > > can this (interrupted) merge be run at 200001010000(00)?
> >
> > Oh, your wondering is natural. My explanation was wrong :-<
> >
> > In real world, steps below reproduce this issue:
> >
> 
> Great explanation! Thanks!
> 
> So the problem is that we mark the file clean in dirstate as a result of
> checking its status, then we update it on disk, abort the update, write the
> clean dirstate to disk and stay on the old revision. It seems a little
> strange to me that we write the dirstate even on failure. Do you know if
> there is a good reason for that?

I don't know actual reason. It may be:

  Invalidating in-memory changes at aborting has subsequent hg
  commands pay cost of "checking dirty-ness" again.

  The more "checking dirty-ness" costs, the more and larger possibly
  dirty files there are.

On the other hand, Pierre-Yves suggested me to discard all dirstate
changes inside transaction at failure in the article about making
in-memory dirstate changes visible to external process (in this case,
important point is "dirstate may refer rollback-ed revisions").

So, I'll re-examine what we should treat in-memory dirstate changes at
failure widely (maybe after fixing this issue :-))


> Perhaps another option is to mark all files in the merge normallookup()
> before applyupdates()? That should mean that an abort later on would be
> safe, no?

`dirstate.normallookup()` always discards all of mode, size and
timestamp. This forces subsequent status examinations to compare file
contents until it marked as clean, because lack of "mode and size"
marks such files as "unsure" until it is marked as "clean".

This costs so much with large "possibly dirty" files.


BTW, I hit an idea of dirstate functionality to discard timestamp but
keep expected mode/size (current `normallookup` discards all of them).

When at least one of mode/size of file is changed, this functionality
can avoid comparison of file content at subsequent status checking.


> >   == @200001010000(00)
> >
> >   (A) 'b' should be modified at this point, but
> >
> >   (B) it is assumed that dirstate timestamp of 'b' is -1 at this point
> >
> >   (C) invoke `hg merge`
> >
> >     (C-1) get wlock
> >
> >     (C-2) check uncommitted changes in `merge.update()` by `wc.files()`:
> >
> >           this indirectly sets dirstate timestamp of 'b' as
> > 200001010000(00)
> >
> >             wc.files() => wc._status() => repo.status() => ... =>
> >             wc._dirstatestatus() => wc._checklookup() => dirstate.normal()
> >
> >
> Would an alternative fix be to write the dirstate here (and after
> recordupdates(), of course)? Would that be unacceptably slow, perhaps?

It should be cheaper than merge/update itself.

But we may have to put explicit `dirstate.write()` into many code
paths for safety. Today, we know that putting it before
`applyupdates()` avoids this issue. Then, tomorrow, we may have to put
it into other places to avoid similar issues.

So, I want to find generalized fixing.


> >     (C-3) `merge.applyupates()` overwrites 'b' by the data in same size
> >
> >           FS timestamp of it is still 200001010000(00)
> >
> >     (C-3) fail before `merge.recordupdates()`
> >
> >   == @200001010000(00) + 1
> >
> >     (C-4) release wlock, and this causes written dirty dirstate out
> >
> >           this also writes out 200001010000(00) of 'b' successfully,
> >           because FS timestamp of dirstate is 200001010000(00) + 1
> >
> >   (D) `hg status` treats 'b' as CLEAN, because FS timestamp/size of it
> >       aren't changed
> >
> > Then, steps below (= without 'hg status -A b') can reproduce this
> > issue almost always:
> >
> >   $ hg update -q -C 3
> >   $ touch -t 200001010000 b
> >   $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> >   $ touch -t 200001010000 b
> >
> > But timestamp of dirstate may be set accidentally at `hg update`, and
> > this breaks assumption (B) above.
> >
> > This is timing critical problem and we can't fully control it :-<
> >
> > On the other hand, tests in my patch execute steps below to make
> > similar (but a little different in detail) situation:
> >
> >   == @200001010000(00)
> >
> >   (A) `touch -t 200001010000 b` emulates that 'b' is modified at this point
> >
> >   (B') `hg status` can write 200001010000(00) of 'b' into dirstate
> > successfully
> >
> >       because `hg status` isn't actually executed at 200001010000(00) :-)
> >
> >   (C) invoke `hg merge`
> >
> >     (C-1) get wlock
> >
> >     (C-2) check uncommitted changes in `merge.update()` by `wc.files()`
> >
> >           'b' is already known as CLEAN by 200001010000(00)
> >
> >     (C-3) `merge.applyupates()` overwrites 'b' by the data in same size
> >
> >           FS timestamp of it isn't 200001010000(00), because `hg
> >           merge` isn't actually executed at 200001010000(00)
> >
> >           but `touch -t 200001010000 b` after `hg merge` emulates it.
> >
> >     (C-3) fail before `merge.recordupdates()`
> >
> >   == @200001010000(00) + 1
> >
> >     (C-4) release wlock, and this causes written dirty dirstate out
> >
> >           dirstate timestamp of 'b' is still 200001010000(00)
> >
> >   (D) `hg status` treats 'b' as CLEAN, because FS timestamp/size of it
> >       aren't changed (only if this issue isn't fixed well)
> >

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list