dirstate entries stuck in unset state

Adrian Buehlmann adrian at cadifra.com
Thu Aug 13 11:05:54 CDT 2009


On 13.08.2009 10:44, Simon Heimberg wrote:
> What about this?
> 
> ---a/mercurial/dirstate.py Don Aug 13 05:58:59 2009 +0200
> +++b/mercurial/dirstate.py Don Aug 13 10:19:08 2009 +0200
> @@ -389,7 +389,7 @@
>          for f, e in self._map.iteritems():
>              if f in copymap:
>                  f = "%s\0%s" % (f, copymap[f])
> -            if e[3] > limit and e[0] == 'n':
> +            if limit < e[3] < limit + 8 * gran and e[0] == 'n':
>                  e = (e[0], 0, -1, -1)
>              e = pack(_format, e[0], e[1], e[2], e[3], len(f))
>              write(e)
> 
> There is the risk that the file changes (without modifying the size)
> when the clock time is near the modification time. (This window has a
> size of gran.) This would not happen when 'hg stat' is called after
> 7*gran before this moment. The file state would be set to 'unset'. (Bad
> English. How can I explain this exactly and nicely?)
> I have chosen 8 because the minimum is 2 (+-gran) and a bigger number is
> more secure. A computer can easy multiply with 8.

Very interesting input!

I think we can get away with the "security" factor of eight (which feels
a bit strange anyway).

If I understand matters correctly, I think setting the entries to 'unset'
was done in the first place to protect against the case where the modification
time of the entry in the map is too close to the modification time of the
dirstate file itself (the latter being used as the moment for when a working
dir manipulation was finished). This protection is for the case of doing
hg calls in sufficiently quick succession.

When that wrinkle was inserted into the code (af3f26b6bba4), an implementation
was choosen that just negleted the fact that there might be odd situations
with files having mtimes in the future, so only a check for a single bound was
inserted.

But I believe the original idea of the check was to detect whether
the mtimes could be too close, in order to preclude treating files erroneously
as "not modified" due to having "the same" mtimes and sizes (i.e. mtimes too
close to each other with respect to the granularity).

So why not outright use a -gran...+gran range and check lower _and_ upper
bounds?

This leads me to:

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -377,9 +377,10 @@ class dirstate(object):
             gran = int(self._ui.config('dirstate', 'granularity', 1))
         except ValueError:
             gran = 1
-        limit = sys.maxint
         if gran > 0:
-            limit = util.fstat(st).st_mtime - gran
+            mt = util.fstat(st).st_mtime
+            ll = mt - gran
+            hl = mt + gran

         cs = cStringIO.StringIO()
         copymap = self._copymap
@@ -389,7 +390,7 @@ class dirstate(object):
         for f, e in self._map.iteritems():
             if f in copymap:
                 f = "%s\0%s" % (f, copymap[f])
-            if e[3] > limit and e[0] == 'n':
+            if gran > 0 and e[0] == 'n' and ll <= e[3] <= hl:
                 e = (e[0], 0, -1, -1)
             e = pack(_format, e[0], e[1], e[2], e[3], len(f))
             write(e)

Does this make sense?

A quick test shows that this at least fixes the exact test case described at
issue 1790.

And with that patch applied I still get -- as before -- a couple of 'unset'
entries (looking at the output of 'hg debugstate') when doing a non-tiny
'hg update' (e.g. 'hg up null' followed by a 'hg up' on the mercurial repo
itself).


More information about the Mercurial-devel mailing list