[PATCH] fix an unnecessary lookup caused by integer vs. non-integer mtimes

Thu Oct 16 10:36:05 CDT 2008

# HG changeset patch
# User Dov Feldstern <dfeldstern at fastimap.com>
# Date 1224171156 -7200
# Node ID 69fb9e403585f073c0f6fdca47993a4f8f86d94f
# Parent  9514cbb6e4f6f54ac8437e7a4eb160a39986a7eb
fix an unnecessary lookup caused by integer vs. non-integer mtimes

The type (int or float) of the various variables representing mtime
depends on a few different factors:

 * Python version: < 2.5, os.lstat() mtime is int, >= 2.5 is float;
 * stat objects are sometimes obtained indirectly, from osutil.listdir().
   The C implementation of this function returns mtime as int, regardless
   of the python version;
 * in dirstate's persistent storage, stat mtimes are stored as ints;
   so the mtimes obtained from dirstate will also depend on whether
   it has been reread from persistent storage yet or not.

In addition to all of the above, there's also the question of whether
the OS/FS support sub-integer stat mtime values. Windows/NTFS apparently
does, for example.

So, on such a system which *does* support subsecond mtimes, we may
encounter the following problem: when comparing a file's mtime to the
stored mtime in order to see if the file has been modified, if one of
the mtimes being compared has been rounded to int, and the other hasn't,
an unmodified file will appear to have been modified. This will result in
an unnecessary "lookup" (i.e., comparison of the full contents of the file
to the stored version).

A partial solution for this was provided way back in 882e703eaa94 (about
two years ago): the time returned by stat is converted to int, and thus
when compared with the value read from dirstate's persistent storage, both
values are ints and the comparison returns the correct result.

However, that only solves the case where the value in dirstate is already
int; as we've seen, it may also be float, and the value obtained from stat
may actually be int!

Therefore, the current fix is to apply int() to both time and stat.st_mtime
and only then compare them...

A test is provided, which does actually fail (on Windows XP, NTFS) if the
fix is not applied.

Note: the test is statistical in nature (depending on the actual mtimes,
which could of course just happen to be x.0). On a FS and Python version
known to support sub-integer values, the percentage of the times the test
is skipped should equal the percentage of the times (out of the times it
is not skipped) that the test succeeds even if the bug is not fixed.

diff -r 9514cbb6e4f6 -r 69fb9e403585 mercurial/dirstate.py

--- a/mercurial/dirstate.py	Tue Oct 14 20:13:53 2008 +0200
+++ b/mercurial/dirstate.py	Thu Oct 16 17:32:36 2008 +0200
@@ -575,7 +575,14 @@
                     or size == -2
                     or fn in self._copymap):
                     madd(fn)
-                elif time != int(st.st_mtime):
+                # st.st_mtime may be int or float, depending on the type 
+                # returned by os.lstat (int in python < 2.5, float >= 2.5),
+                # as well as the type returned in osutil.listdir (in the C
+                # implementation, int is returned); and time may be int or
+                # float, depending also on whether the dirstate has been
+                # re-read from persistent storage, where it is stored as int.
+                # So it's just simpler to convert everything to int():
+                elif int(time) != int(st.st_mtime):
                     ladd(fn)
                 elif listclean:
                     cadd(fn)
diff -r 9514cbb6e4f6 -r 69fb9e403585 tests/test-unncessary-lookup-non-integer-stat.py
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/test-unncessary-lookup-non-integer-stat.py	Thu Oct 16 17:32:36 2008 +0200
@@ -0,0 +1,47 @@
+from mercurial import hg, ui, commands, match as match_
+import os, sys, time
+
+TESTDIR = os.environ['TESTDIR']
+
+# skip if stat mtime returns integer values
+mtime = os.lstat(TESTDIR).st_mtime
+if int(mtime) == mtime:
+    sys.exit(80) # SKIPPED_STATUS defined in run-tests.py
+# even if both python and the FS support sub-integer stat values, the
+# above test may succeed; if so, just run the test again... Actually, the
+# percentage of the times that the test is skipped should be equal to the
+# percentage of times that the test succeeds (out of the times it actually
+# runs) even if there is a bug.
+
+u = ui.ui()
+commands.init(u, '.')
+repo = hg.repository(u, '.')
+
+# create a file...
+f = open('a', 'w')
+f.write('just some text...')
+f.close()
+# ... commit it (but sleep 1 second, so that it doesn't get stat -1)...
+time.sleep(1)
+commands.commit(u, repo, 'a', message='commit', 
+                addremove=True, logfile=None, user=None, date=None)
+# ... and check the status (as returned by dirstate, before the lookups!) of
+# this file. the file should be clean, not lookup:
+s = repo.dirstate.status(match_.always(repo.root, repo.getcwd()),
+                         ignored=True, clean=True, unknown=True)
+print "lookup:"
+print s[0]
+print "clean:"
+print s[7]
+
+# now we'll do the same test, after deleting the repo, so that dirstate will
+# be read from disk
+del repo
+repo = hg.repository(u, '.')
+
+s = repo.dirstate.status(match_.always(repo.root, repo.getcwd()),
+                         ignored=True, clean=True, unknown=True)
+print "lookup:"
+print s[0]
+print "clean:"
+print s[7]
diff -r 9514cbb6e4f6 -r 69fb9e403585 tests/test-unncessary-lookup-non-integer-stat.py.out
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/test-unncessary-lookup-non-integer-stat.py.out	Thu Oct 16 17:32:36 2008 +0200
@@ -0,0 +1,8 @@
+lookup:
+[]
+clean:
+['a']
+lookup:
+[]
+clean:
+['a']