[PATCH] treemanifest: don't iterate entire matching submanifests on match()

Martin von Zweigbergk martinvonz at google.com
Sun Dec 13 06:38:28 UTC 2015


# HG changeset patch
# User Martin von Zweigbergk <martinvonz at google.com>
# Date 1449943025 28800
#      Sat Dec 12 09:57:05 2015 -0800
# Node ID f78ef6cfe9d0fd502a219ba0b8e4ff18967f7610
# Parent  944af8e2eb4cddf96ba5b8a96854528b40979715
treemanifest: don't iterate entire matching submanifests on match()

Before 2773540c3650 (match: remove unnecessary optimization where
visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
the special value 'all' to indicate that it was known that all
subdirectories would also be included in the match. The purpose for
that value was to avoid calling the matcher on all the paths. It
turned out that calling the matcher was not a problem, so the special
return value was removed and the code was simplified. However, if we
use the same special value for not just avoiding calling the matcher
on each file, but to avoid iterating over each file, it's a much
bigger win. On commands like

  hg st --rev .^ --rev . dom/

we run the matcher (dom/) on the two manifests, then diff the narrowed
manifest. If the size of the match is much larger than the size of the
diff, this is wasteful. In the above case, we would end up iterating
over the 15k-or-so files in dom/ for each of the manifests, only to
later discover that they are mostly the same. This means that runningt
the command above is usually slower than getting the status for the
entire repo, because that code avoids calling treemanifest.match() and
only calls treemanifest.diff(), which loads only what's needed for the
diff.

Let's fix this by reintroducing the 'all' value in match.visitdir()
and making treemanifest.match() return a lazy copy of the manifest
from dom/ and down (in the above case). This speeds up the above
command on the Firefox repo from 0.357s to 0.137s (best of 5). The
wider the match, the bigger the speedup.

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -740,9 +740,12 @@
     def _matches(self, match):
         '''recursively generate a new manifest filtered by the match argument.
         '''
+
+        visit = match.visitdir(self._dir[:-1] or '.')
+        if visit == 'all':
+            return self.copy()
         ret = treemanifest(self._dir)
-
-        if not match.visitdir(self._dir[:-1] or '.'):
+        if not visit:
             return ret
 
         self._load()
diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -227,9 +227,15 @@
         has potential matches in it or one of its subdirectories. This is
         based on the match's primary, included, and excluded patterns.
 
+        Returns the string 'all' if the given directory and all subdirectories
+        should be visited. Otherwise returns True or False indicating whether
+        the given directory should be visited.
+
         This function's behavior is undefined if it has returned False for
         one of the dir's parent directories.
         '''
+        if self.prefix() and dir in self._fileroots:
+            return 'all'
         if dir in self._excluderoots:
             return False
         if (self._includeroots and


More information about the Mercurial-devel mailing list