[PATCH 1 of 2] match: optimize visitdir() for patterns matching only root directory

Martin von Zweigbergk martinvonz at google.com
Fri May 5 16:39:48 UTC 2017


# HG changeset patch
# User Martin von Zweigbergk <martinvonz at google.com>
# Date 1493999347 25200
#      Fri May 05 08:49:07 2017 -0700
# Node ID d89bf290dc63661e5e1cdc48753322c30560c15d
# Parent  2cfdf5241096f6c0c2d45d32b2f1a41575835025
match: optimize visitdir() for patterns matching only root directory

Because _rootsanddirs() returns a list of directories to visit
recursively and a list of directories to visit non-recursively. For
patterns such as 'rootfilesin:foo/bar', we clearly need to visit the
directory foo/bar, but we also need to visit its parents. The method
therefore uses util.dirs() to find the parent directories of
'foo/bar'. That method does not include the root directory, but since
we obviously need to visit the root directory, we always added '.' to
the set of directories to visit non-recursively.

The visitdir() method had special handling to consider set(['.']) to
mean that no includes had been specified and would thus visit all
directories. However, when the pattern is 'rootfilesin:.', set(['.'])
is actually the real set of directories to visit and the special
handling of that set meant that all directories got visited instead of
just the root directory.

The fix is simple: add '.' to the set of parent directories in
_rootsanddirs() and stop treating set(['.']) specially. This makes

  hg files -r .  -I rootfilesin:.

in a treemanifest version of the Firefox repo go from 1.5s to 0.26s on
warm disk (and a *much* bigger improvement on cold disk).

Note that the -I is necessary for no good reason. We just haven't
optimized visitdir() for regular (non-include, non-exclude) patterns
yet.

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -134,7 +134,7 @@
         self._includeroots = set()
         self._excluderoots = set()
         # dirs are directories which are non-recursively included.
-        self._includedirs = set(['.'])
+        self._includedirs = set()
 
         if badfn is not None:
             self.bad = badfn
@@ -254,7 +254,7 @@
             return 'all'
         if dir in self._excluderoots:
             return False
-        if ((self._includeroots or self._includedirs != set(['.'])) and
+        if ((self._includeroots or self._includedirs) and
             '.' not in self._includeroots and
             dir not in self._includeroots and
             dir not in self._includedirs and
@@ -684,16 +684,16 @@
 
     >>> _rootsanddirs(\
         [('glob', 'g/h/*', ''), ('glob', 'g/h', ''), ('glob', 'g*', '')])
-    (['g/h', 'g/h', '.'], ['g'])
+    (['g/h', 'g/h', '.'], ['g', '.'])
     >>> _rootsanddirs(\
         [('rootfilesin', 'g/h', ''), ('rootfilesin', '', '')])
-    ([], ['g/h', '.', 'g'])
+    ([], ['g/h', '.', 'g', '.'])
     >>> _rootsanddirs(\
         [('relpath', 'r', ''), ('path', 'p/p', ''), ('path', '', '')])
-    (['r', 'p/p', '.'], ['p'])
+    (['r', 'p/p', '.'], ['p', '.'])
     >>> _rootsanddirs(\
         [('relglob', 'rg*', ''), ('re', 're/', ''), ('relre', 'rr', '')])
-    (['.', '.', '.'], [])
+    (['.', '.', '.'], ['.'])
     '''
     r, d = _patternrootsanddirs(kindpats)
 
@@ -701,6 +701,8 @@
     # scanned to get to either the roots or the other exact directories.
     d.extend(util.dirs(d))
     d.extend(util.dirs(r))
+    # util.dirs() does not include the root directory, so add it manually
+    d.append('.')
 
     return r, d
 


More information about the Mercurial-devel mailing list