[PATCH] fix an unnecessary lookup caused by non-integer mtime returned by os.stat

Wed Oct 15 16:10:46 CDT 2008

# HG changeset patch
# User Dov Feldstern <dfeldstern at fastimap.com>
# Date 1224104241 -7200
# Node ID 4ee66eba1e33605d6f46d91005e97d54e83fd2fe
# Parent  e786192d995d8440a461f051bd43db19b91e4cd8
fix an unnecessary lookup caused by non-integer mtime returned by os.stat

In python 2.5, stat.st_mtime returns type float. If the FS also supports
sub-integer stat values, then the actual values returned may be non-integer.
However, when dirstate is written to disk, the mtime is stored as an integer.
This means that comparisons between the mtime returned by stat and those
stored in dirstate can fail even if the file hasn't been touched since
the last commit.

A partial solution for this was provided way back in 882e703eaa94 (about
two years ago): the time returned by stat is converted to int, and thus
when compared with the value read from dirstate's persistent storage, both
values are ints and the comparison returns the correct result.

However, there is still a problematic scenario: if the comparison is
performed *before* the dirstate is destroyed and the re-read from disk,
then the value stored in it is still a float; and thus the "fix" actually
causes the comparison to fail, because we compare the stored float with
the int-converted stat.mtime. Conversely, the new parse_dirstate also
converts stat.st_mtime to int --- so we can't be sure of stat's type,
either!

Therefore, the current fix is to apply int() to both time and stat.st_mtime
and only then compare them...

A test is provided, which does actually fail if the fix is not applied.
Note: the test is statistical in nature (depending on the actual mtimes,
which could of course just happen to be x.0). On a FS and Python version
known to support sub-integer values, the percentage of the times the test
is skipped should equal the percentage of the times (out of the times it
is not skipped) that the test succeeds even if the bug is not fixed.

diff -r e786192d995d -r 4ee66eba1e33 mercurial/dirstate.py

--- a/mercurial/dirstate.py	Wed Oct 15 20:14:28 2008 +0200
+++ b/mercurial/dirstate.py	Wed Oct 15 22:57:21 2008 +0200
@@ -575,7 +575,7 @@
                     or size == -2
                     or fn in self._copymap):
                     madd(fn)
-                elif time != int(st.st_mtime):
+                elif int(time) != int(st.st_mtime):
                     ladd(fn)
                 elif listclean:
                     cadd(fn)
diff -r e786192d995d -r 4ee66eba1e33 tests/test-unncessary-lookup-non-integer-stat.py
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/test-unncessary-lookup-non-integer-stat.py	Wed Oct 15 22:57:21 2008 +0200
@@ -0,0 +1,47 @@
+from mercurial import hg, ui, commands, match as match_
+import os, sys, time
+
+TESTDIR = os.environ['TESTDIR']
+
+# skip if stat mtime returns integer values
+mtime = os.lstat(TESTDIR).st_mtime
+if int(mtime) == mtime:
+    sys.exit(80) # SKIPPED_STATUS defined in run-tests.py
+# even if both python and the FS support sub-integer stat values, the
+# above test may succeed; if so, just run the test again... Actually, the
+# percentage of the times that the test is skipped should be equal to the
+# percentage of times that the test succeeds (out of the times it actually
+# runs) even if there is a bug.
+
+u = ui.ui()
+commands.init(u, '.')
+repo = hg.repository(u, '.')
+
+# create a file...
+f = open('a', 'w')
+f.write('just some text...')
+f.close()
+# ... commit it (but sleep 1 second, so that it doesn't get stat -1)...
+time.sleep(1)
+commands.commit(u, repo, 'a', message='commit', 
+                addremove=True, logfile=None, user=None, date=None)
+# ... and check the status (as returned by dirstate, before the lookups!) of
+# this file. the file should be clean, not lookup:
+s = repo.dirstate.status(match_.always(repo.root, repo.getcwd()),
+                         ignored=True, clean=True, unknown=True)
+print "lookup:"
+print s[0]
+print "clean:"
+print s[7]
+
+# now we'll do the same test, after deleting the repo, so that dirstate will
+# be read from disk
+del repo
+repo = hg.repository(u, '.')
+
+s = repo.dirstate.status(match_.always(repo.root, repo.getcwd()),
+                         ignored=True, clean=True, unknown=True)
+print "lookup:"
+print s[0]
+print "clean:"
+print s[7]
diff -r e786192d995d -r 4ee66eba1e33 tests/test-unncessary-lookup-non-integer-stat.py.out
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/test-unncessary-lookup-non-integer-stat.py.out	Wed Oct 15 22:57:21 2008 +0200
@@ -0,0 +1,8 @@
+lookup:
+[]
+clean:
+['a']
+lookup:
+[]
+clean:
+['a']