[PATCH] hg_log: speed up hg log for untracked files (issue1340)

S Muralidhar smuralid at yahoo.com
Mon Sep 10 23:49:29 CDT 2012


# HG changeset patch
# User smuralid
# Date 1347338107 25200
# Node ID 28f72ea4d93a27650759874ed8c5928f69a3b37c
# Parent  fc14953e8e34667181cd0d492825342b5e1e880b
hg_log: speed up hg log for untracked files (issue1340)

'hg log' on untracked files tends to be fairly slow. The root cause is that we end up using the 'slowpath' when we can't find a revlog for the files listed. This could happen if the file in question is an untracked file, or it is a directory.
This diff tries to speed up 'hg log' (by avoiding the slowpath) for files if we can determine if that file is not (and never was) a directory. Since we don't have a separate revlog for directories, I'm taking advantage of the fact that the various "versions" of a directory are preserved (under .hg/store/data).
'hg log' will continue to be slow for directories - that's for the next diff.

This iteration of the diff pushes the core path-existence logic into the store layer (specifically into fncachestore and basicstore) based on Matt's feedback.
basicstore simply tests if the file exists on the OS. fncachestore looks at its cache first to see if a path exists- if not, it calls its superclass (basicstore) to test if a directory by that name exists.
An alternate approach (in fncachestore) would have been to simply rely on the cache.__contains__ method even for directories. Fncache currently doesn't track directories at all, and it seemed inappropriate to extend this. It was more involved, for one. A larger cache has its downsides and there are some (slight) gotchas with forcing a rebuild of the fncache so it can include directories (for existing repos).

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -1067,8 +1067,17 @@
                     if follow:
                         raise util.Abort(
                             _('cannot follow nonexistent file: "%s"') % file_)
-                    slowpath = True
-                    break
+                    # Check to see if this was ever a directory. If not, then
+                    # we know that no such file/directory with this name
+                    # existed (as evidenced by the zero-length filelog) - so,
+                    # there's nothing to log.
+                    # For directories, we still need to take the slow path
+                    if repo.containsdir(file_):
+                        repo.ui.debug("%s may be a directory" % file_)
+                        slowpath = True
+                        break
+                    else:
+                        continue
                 else:
                     continue
 
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -788,6 +788,17 @@
     def wjoin(self, f):
         return os.path.join(self.root, f)
 
+    def containsdir(self, d):
+        """Determines if 'd' is currently (or was in the past) a
+        valid directory in the repo"""
+        if d.startswith('/'):
+            d = d[1:]
+        if d == ".":
+            return True
+        if not d.endswith('/'):
+            d = d + '/'
+        return d in self.store
+
     def file(self, f):
         if f[0] == '/':
             f = f[1:]
diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -286,6 +286,10 @@
     def write(self):
         pass
 
+    def __contains__(self, path):
+        '''Checks if this path exists in the store'''
+        return self.opener.exists(path)
+
 class encodedstore(basicstore):
     def __init__(self, path, openertype):
         self.path = path + '/store'
@@ -378,6 +382,9 @@
             self.fncache.add(path)
         return self.opener(self.encode(path), mode, *args, **kw)
 
+    def join(self, f):
+        return self.opener.join(f)
+
 class fncachestore(basicstore):
     def __init__(self, path, openertype, encode):
         self.encode = encode
@@ -422,6 +429,17 @@
     def write(self):
         self.fncache.write()
 
+    def __contains__(self, path):
+        '''Checks if this path exists in the store'''
+        path = "/".join(("data", path))
+        if path in self.fncache:
+            return True
+        else:
+            # Didn't find 'path' in cache.
+            # fncache does not keep info on directories - and this may be be
+            # a directory, so ask the underlying store about it.
+            return super(fncachestore, self).__contains__(self.join(path))
+
 def store(requirements, path, openertype):
     if 'store' in requirements:
         if 'fncache' in requirements:
diff --git a/tests/test-glog.t b/tests/test-glog.t
--- a/tests/test-glog.t
+++ b/tests/test-glog.t
@@ -1606,6 +1606,11 @@
             ('string', 'd:relpath'))
           ('string', 'p:a'))
         ('string', 'p:c'))))
+  --- log.nodes	* (glob)
+  +++ glog.nodes	* (glob)
+  @@ -1 +1,2 @@
+  +nodetag 3
+   nodetag 0
 
 Test multiple --include/--exclude/paths
 
diff --git a/tests/test-log.t b/tests/test-log.t
--- a/tests/test-log.t
+++ b/tests/test-log.t
@@ -1213,3 +1213,52 @@
   1
 
   $ cd ..
+
+test hg log on non-existent files and on directories
+  $ hg init issue1340
+  $ cd issue1340
+  $ mkdir d1; mkdir D2; mkdir D3.i; mkdir d4.hg; mkdir d5.d; mkdir .d6
+  $ echo 1 > d1/f1
+  $ echo 1 > D2/f1
+  $ echo 1 > D3.i/f1
+  $ echo 1 > d4.hg/f1
+  $ echo 1 > d5.d/f1
+  $ echo 1 > .d6/f1
+  $ hg add .
+  adding .d6/f1
+  adding D2/f1
+  adding D3.i/f1
+  adding d1/f1
+  adding d4.hg/f1
+  adding d5.d/f1
+  $ hg commit -m "a bunch of weird directories"
+  $ hg log -l1 d1/f1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 f1
+  $ hg log -l1 . | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 ./ | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 d1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 D2 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 D2/f1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 D3.i | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 D3.i/f1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 d4.hg | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 d4.hg/f1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 d5.d | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 d5.d/f1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 .d6 | grep changeset
+  changeset:   0:65624cd9070a
+  $ hg log -l1 .d6/f1 | grep changeset
+  changeset:   0:65624cd9070a
+  $ cd ..


More information about the Mercurial-devel mailing list