[PATCH 2 of 5] localrepo: invoke dirstate.unsureifambig in wwrite for safety
Martin von Zweigbergk
martinvonz at google.com
Tue Jun 2 14:40:26 CDT 2015
On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
wrote:
>
> At Mon, 01 Jun 2015 18:57:16 +0000,
> Martin von Zweigbergk wrote:
> >
> > [1 <text/plain; UTF-8 (7bit)>]
> > On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <
> foozy at lares.dti.ne.jp>
> > wrote:
> >
> > > At Mon, 01 Jun 2015 17:02:10 +0000,
> > > Martin von Zweigbergk wrote:
> > > >
> > > > [1 <text/plain; UTF-8 (7bit)>]
> > > > Thanks. I think I knew most of that, but the subrepo case for
> interrupted
> > > > update was a good example I had not thought about. My question
> remains,
> > > > however. In the sequence in this test case, you do this:
> > > >
> > > > $ hg update -q -C 3
> > > > $ touch -t 200001010000 b
> > > > $ hg status -A b
> > > > $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> > > > $ touch -t 200001010000 b
> > > >
> > > > What I'm wondering is what real-world scenarios could lead to file
> b's
> > > > timestamp being the same in those two cases (where you artificially
> touch
> > > > them here). Since we said that 'hg status' is run strictly after (at
> > > > whatever granularity is recorded by the FS, perhaps second-level),
> how
> > > can
> > > > the 'hg merge' produce a timestamp for file b that's the same as
> what 'hg
> > > > update' does?
> > >
> > > `hg merge` can be aborted by same reason at `hg update` before
> > > updating dirstate:
> > >
> > > (0) cleanup working directory for examination of `hg merge`
> > > (let's ignore this while focusing on examination of `hg merge`)
> > > > $ hg update -q -C 3
> > >
> > > (1) ensure forcibly that `repo.dirstate['b']` has valid timestamp
> > > > $ touch -t 200001010000 b
> > > > $ hg status -A b
> > >
> >
> > We said that this 'hg status' is run at 200001010000(00)+1sec (otherwise
> > the timestamp in the dirstate would be unset). Correct?
>
> Yes, it is correct.
>
> > >
> > > (2) execute `hg merge`, but it is aborted before "update dirstate"
> > > for example:
> > > - invalid updating in subrepositories
> > > - keyboard interruption by user, file I/O error and so on
> > > > $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> > >
> > > (3) (re-)set timestamp of 'b' on the filesystem for mis-leading
> > > this emulates that `hg merge` update 'b' before
> 200001010000(00)+1sec
> > > > $ touch -t 200001010000 b
> > >
> >
> > Since 'hg status' above was run at 200001010000(00)+1sec (or later), how
> > can this (interrupted) merge be run at 200001010000(00)?
>
> Oh, your wondering is natural. My explanation was wrong :-<
>
> In real world, steps below reproduce this issue:
>
Great explanation! Thanks!
So the problem is that we mark the file clean in dirstate as a result of
checking its status, then we update it on disk, abort the update, write the
clean dirstate to disk and stay on the old revision. It seems a little
strange to me that we write the dirstate even on failure. Do you know if
there is a good reason for that?
Perhaps another option is to mark all files in the merge normallookup()
before applyupdates()? That should mean that an abort later on would be
safe, no?
> == @200001010000(00)
>
> (A) 'b' should be modified at this point, but
>
> (B) it is assumed that dirstate timestamp of 'b' is -1 at this point
>
> (C) invoke `hg merge`
>
> (C-1) get wlock
>
> (C-2) check uncommitted changes in `merge.update()` by `wc.files()`:
>
> this indirectly sets dirstate timestamp of 'b' as
> 200001010000(00)
>
> wc.files() => wc._status() => repo.status() => ... =>
> wc._dirstatestatus() => wc._checklookup() => dirstate.normal()
>
>
Would an alternative fix be to write the dirstate here (and after
recordupdates(), of course)? Would that be unacceptably slow, perhaps?
> (C-3) `merge.applyupates()` overwrites 'b' by the data in same size
>
> FS timestamp of it is still 200001010000(00)
>
> (C-3) fail before `merge.recordupdates()`
>
> == @200001010000(00) + 1
>
> (C-4) release wlock, and this causes written dirty dirstate out
>
> this also writes out 200001010000(00) of 'b' successfully,
> because FS timestamp of dirstate is 200001010000(00) + 1
>
> (D) `hg status` treats 'b' as CLEAN, because FS timestamp/size of it
> aren't changed
>
> Then, steps below (= without 'hg status -A b') can reproduce this
> issue almost always:
>
> $ hg update -q -C 3
> $ touch -t 200001010000 b
> $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> $ touch -t 200001010000 b
>
> But timestamp of dirstate may be set accidentally at `hg update`, and
> this breaks assumption (B) above.
>
> This is timing critical problem and we can't fully control it :-<
>
> On the other hand, tests in my patch execute steps below to make
> similar (but a little different in detail) situation:
>
> == @200001010000(00)
>
> (A) `touch -t 200001010000 b` emulates that 'b' is modified at this point
>
> (B') `hg status` can write 200001010000(00) of 'b' into dirstate
> successfully
>
> because `hg status` isn't actually executed at 200001010000(00) :-)
>
> (C) invoke `hg merge`
>
> (C-1) get wlock
>
> (C-2) check uncommitted changes in `merge.update()` by `wc.files()`
>
> 'b' is already known as CLEAN by 200001010000(00)
>
> (C-3) `merge.applyupates()` overwrites 'b' by the data in same size
>
> FS timestamp of it isn't 200001010000(00), because `hg
> merge` isn't actually executed at 200001010000(00)
>
> but `touch -t 200001010000 b` after `hg merge` emulates it.
>
> (C-3) fail before `merge.recordupdates()`
>
> == @200001010000(00) + 1
>
> (C-4) release wlock, and this causes written dirty dirstate out
>
> dirstate timestamp of 'b' is still 200001010000(00)
>
> (D) `hg status` treats 'b' as CLEAN, because FS timestamp/size of it
> aren't changed (only if this issue isn't fixed well)
>
>
>
> > > In addition to it, '.hgsubstate' may have to be focused on.
> > >
> > > Regardless of `hg update`/`hg merge` result, size of it on the
> > > filesystem is often kept, because '.hgsubstate' consists of "hashid
> > > subrepopath": "hashid" of hg subrepo is always 40 letters :-)
> > >
> > >
> > > (Do I understand your question correctly ?)
> > >
> > >
> > > > On Sun, May 31, 2015 at 10:24 PM FUJIWARA Katsunori <
> > > foozy at lares.dti.ne.jp>
> > > > wrote:
> > > >
> > > > > At Fri, 29 May 2015 18:05:18 +0000,
> > > > > Martin von Zweigbergk wrote:
> > > > > >
> > > > > > [1 <text/plain; UTF-8 (7bit)>]
> > > > > > On Wed, May 27, 2015 at 10:06 AM FUJIWARA Katsunori <
> > > > > foozy at lares.dti.ne.jp>
> > > > > > wrote:
> > > > > >
> > > > > > > # HG changeset patch
> > > > > > > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > > > > > > # Date 1432745859 -32400
> > > > > > > # Thu May 28 01:57:39 2015 +0900
> > > > > > > # Node ID b82f32b8734b109b84f533dc62dd3f23195d1e0a
> > > > > > > # Parent ab9f120295b59933d1acd72771f01b5fac8d221d
> > > > > > > localrepo: invoke dirstate.unsureifambig in wwrite for safety
> > > > > > >
> > > > > > > Modified file may be mis-recognized as clean by dirstate, if
> mode,
> > > > > > > size and timestamp of it on the filesystem aren't changed.
> > > > > > >
> > > > > > > To avoid such ambiguous situation, this patch invokes
> > > > > > > `dirstate.unsureifambig()` at the end of
> > > `localrepository.wwrite()`.
> > > > > > >
> > > > > > > When file is cleanly reverted or updated, subsequent
> > > > > > > `dirstate.lookup()` makes `dirstate.unsureifambig()` a little
> > > > > > > redundant. But it is enough cheap for keeping consistency.
> > > > > > >
> > > > > > > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > > > > > > --- a/mercurial/localrepo.py
> > > > > > > +++ b/mercurial/localrepo.py
> > > > > > > @@ -927,7 +927,9 @@
> > > > > > > self.wvfs.write(filename, data)
> > > > > > > if 'x' in flags:
> > > > > > > self.wvfs.setflags(filename, False, True)
> > > > > > > - return len(data)
> > > > > > > + wsize = len(data)
> > > > > > > + self.dirstate.unsureifambig(filename, wsize)
> > > > > > > + return wsize
> > > > > > >
> > > > > > > def wwritedata(self, filename, data):
> > > > > > > return self._filter(self._decodefilterpats, filename,
> > > data)
> > > > > > > diff --git a/tests/test-merge1.t b/tests/test-merge1.t
> > > > > > > --- a/tests/test-merge1.t
> > > > > > > +++ b/tests/test-merge1.t
> > > > > > > @@ -206,4 +206,60 @@
> > > > > > > $ hg revert -r -2 b
> > > > > > > $ hg up -q -- -2
> > > > > > >
> > > > > > > +Test for ambiguity from same size, timestamp and mode
> > > > > > > +
> > > > > > > + $ cat > $TESTTMP/abort.py <<EOF
> > > > > > > + > from mercurial import extensions, merge, util
> > > > > > > + > def applyupdates(orig, *args, **kwargs):
> > > > > > > + > orig(*args, **kwargs)
> > > > > > > + > # emulate aborting before "recordupdates()"
> > > > > > > + > # => files are changed without updating dirstate
> > > > > > > + > raise util.Abort('intentional aborting')
> > > > > > > + > def extsetup(ui):
> > > > > > > + > extensions.wrapfunction(merge, "applyupdates",
> > > applyupdates)
> > > > > > > + > EOF
> > > > > > > +
> > > > > > > +(file gotten from other revision)
> > > > > > > +
> > > > > > > + $ hg update -q -C 2
> > > > > > > + $ echo 'THIS IS FILE B5' > b
> > > > > > > + $ hg commit -m 'commit #5'
> > > > > > > +
> > > > > > > + $ hg update -q -C 3
> > > > > > > + $ touch -t 200001010000 b
> > > > > > > + $ hg status -A b
> > > > > > > + C b
> > > > > > > + $ cat b
> > > > > > > + This is file b1
> > > > > > > +
> > > > > > > + $ hg --config extensions.abort=$TESTTMP/abort.py merge 5
> > > > > > > + abort: intentional aborting
> > > > > > > + [255]
> > > > > > > + $ touch -t 200001010000 b
> > > > > > >
> > > > > >
> > > > > > I'm sorry I'm slow, but I'm still trying to understand this
> series.
> > > This
> > > > > is
> > > > > > of course an artificial way of reproducing the problem you were
> > > seeing,
> > > > > but
> > > > > > how does it sometimes happen in real life? Since the dirstate is
> > > written
> > > > > on
> > > > > > the 'hg status' call above, and that command is run after (not
> > > within the
> > > > > > same second, as we discussed on path 1/5) the file was modified
> on
> > > disk,
> > > > > > how can the 'hg merge' command be run at the earlier time again
> > > (which
> > > > > you
> > > > > > seem to be simulating here by setting the same timestamp)?
> > > > >
> > > > > Sorry for not enough explanation. I'll add more detailed
> explanation
> > > > > in revised version.
> > > > >
> > > > > Core of `hg update` and `hg merge` consists of steps below:
> > > > >
> > > > > 1. calculate result of updating and examine some additional
> points
> > > below
> > > > >
> > > > > - `_checkunknownfiles()` (implied by `calculateupdates()`)
> > > > > - `_checkcollision()`
> > > > >
> > > > > 2. update files in the working directory in
> `merge.applyupdates()`
> > > > >
> > > > > this also implies updating files in subrepositories
> recursively
> > > > > (= `preupdate`/`update` hooks in them are also executed)
> > > > >
> > > > > 3. update dirstate in `merge.recordupdates()`
> > > > >
> > > > > 4. write in-memory dirstate changes out at releasing wlock
> > > > >
> > > > >
> > > > > Tests in this patch emulate that `hg update` and `hg merge` (at
> top of
> > > > > subrepo tree) are aborted between (2) and (3) by:
> > > > >
> > > > > - invalid updating in subrepositories
> > > > >
> > > > > for example:
> > > > > - uncommitted changes
> > > > > - collision against unknown files in the working directory
> > > > > - case-insensitive collision between files in new revision (and
> > > > > current one at merging)
> > > > > - failure of `preupdate` or `update` hooks
> > > > >
> > > > > - keyboard interruption by user, file I/O error and so on
> > > > >
> > > > > `test-subrepo.t` actually treats the former situation, and
> > > > > 6becb9dbca25 tried to fix (part of) this issue. Tests in this patch
> > > > > can be said as "generalized version of test in 6becb9dbca25".
> > > > >
> > > > >
> > > > > Even though some existing tests are enough as a white box test
> > > > > examining whether `repo.wwrite()` invokes
> `dirstate.unsureifambig()`
> > > > > correctly, testing in multiple situations should be useful to avoid
> > > > > unexpected regression in the future, IMHO.
> > > > >
> > > > >
> > > > >
> > > > > > > + $ cat b
> > > > > > > + THIS IS FILE B5
> > > > > > > + $ hg status -A b
> > > > > > > + M b
> > > > > > > +
> > > > > > > +(file merged from other revision)
> > > > > > > +
> > > > > > > + $ hg update -q -C 3
> > > > > > > + $ echo 'this is file b6' > b
> > > > > > > + $ hg commit -m 'commit #6'
> > > > > > > + created new head
> > > > > > > + $ touch -t 200001010000 b
> > > > > > > + $ hg status -A b
> > > > > > > + C b
> > > > > > > + $ cat b
> > > > > > > + this is file b6
> > > > > > > +
> > > > > > > + $ hg --config extensions.abort=$TESTTMP/abort.py merge
> --tool
> > > > > > > internal:other 5
> > > > > > > + abort: intentional aborting
> > > > > > > + [255]
> > > > > > > + $ touch -t 200001010000 b
> > > > > > > + $ cat b
> > > > > > > + THIS IS FILE B5
> > > > > > > + $ hg status -A b
> > > > > > > + M b
> > > > > > > +
> > > > > > > $ cd ..
> > > > > > > _______________________________________________
> > > > > > > Mercurial-devel mailing list
> > > > > > > Mercurial-devel at selenic.com
> > > > > > > https://selenic.com/mailman/listinfo/mercurial-devel
> > > > > > >
> > > > > > [2 <text/html; UTF-8 (quoted-printable)>]
> > > > > >
> > > > >
> > > > >
> ----------------------------------------------------------------------
> > > > > [FUJIWARA Katsunori]
> foozy at lares.dti.ne.jp
> > > > >
> > > > [2 <text/html; UTF-8 (quoted-printable)>]
> > > >
> > >
> > > ----------------------------------------------------------------------
> > > [FUJIWARA Katsunori] foozy at lares.dti.ne.jp
> > >
> > [2 <text/html; UTF-8 (quoted-printable)>]
> >
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori] foozy at lares.dti.ne.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150602/9d3e0d60/attachment.html>
More information about the Mercurial-devel
mailing list