D5058: match: optimize matcher when all patterns are of rootfilesin kind

martinvonz (Martin von Zweigbergk) phabricator at mercurial-scm.org
Sat Oct 13 09:20:54 UTC 2018


martinvonz created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Internally at Google, we use narrowspecs with only rootfilesin-kind
  patterns. Sometimes there are thousands of such patterns
  (i.e. thousands of tracked directories). In such cases, it can take
  quite long to build and evaluate the resulting matcher.
  
  This patch optimizes matchers that have only patterns of rootfilesin
  so it instead of creating a regular expression, it matches the given
  file's directory against the set of directories.
  
  In a repo with ~3600 tracked directories, it takes about 1.35 s to
  build the matcher and 2.7 s to walk the dirstate before this
  patch. After, it takes 0.04 s to create the matcher and 0.87 s to walk
  the dirstate.
  
  It may be worthwhile to do similar optimizations for e.g. patterns of
  type "kind:", but that's not a priority for us right now.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5058

AFFECTED FILES
  mercurial/match.py
  tests/test-walk.t

CHANGE DETAILS

diff --git a/tests/test-walk.t b/tests/test-walk.t
--- a/tests/test-walk.t
+++ b/tests/test-walk.t
@@ -143,33 +143,33 @@
 
   $ hg debugwalk -v 'rootfilesin:'
   * matcher:
-  <patternmatcher patterns='(?:[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['.']">
   f  fennel      ../fennel
   f  fenugreek   ../fenugreek
   f  fiddlehead  ../fiddlehead
   $ hg debugwalk -v -I 'rootfilesin:'
   * matcher:
-  <includematcher includes='(?:[^/]+$)'>
+  <includematcher includes="rootfilesin: ['.']">
   f  fennel      ../fennel
   f  fenugreek   ../fenugreek
   f  fiddlehead  ../fiddlehead
   $ hg debugwalk -v 'rootfilesin:.'
   * matcher:
-  <patternmatcher patterns='(?:[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['.']">
   f  fennel      ../fennel
   f  fenugreek   ../fenugreek
   f  fiddlehead  ../fiddlehead
   $ hg debugwalk -v -I 'rootfilesin:.'
   * matcher:
-  <includematcher includes='(?:[^/]+$)'>
+  <includematcher includes="rootfilesin: ['.']">
   f  fennel      ../fennel
   f  fenugreek   ../fenugreek
   f  fiddlehead  ../fiddlehead
   $ hg debugwalk -v -X 'rootfilesin:'
   * matcher:
   <differencematcher
     m1=<alwaysmatcher>,
-    m2=<includematcher includes='(?:[^/]+$)'>>
+    m2=<includematcher includes="rootfilesin: ['.']">>
   f  beans/black                     ../beans/black
   f  beans/borlotti                  ../beans/borlotti
   f  beans/kidney                    ../beans/kidney
@@ -182,55 +182,55 @@
   f  mammals/skunk                   skunk
   $ hg debugwalk -v 'rootfilesin:fennel'
   * matcher:
-  <patternmatcher patterns='(?:fennel/[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['fennel']">
   $ hg debugwalk -v -I 'rootfilesin:fennel'
   * matcher:
-  <includematcher includes='(?:fennel/[^/]+$)'>
+  <includematcher includes="rootfilesin: ['fennel']">
   $ hg debugwalk -v 'rootfilesin:skunk'
   * matcher:
-  <patternmatcher patterns='(?:skunk/[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['skunk']">
   $ hg debugwalk -v -I 'rootfilesin:skunk'
   * matcher:
-  <includematcher includes='(?:skunk/[^/]+$)'>
+  <includematcher includes="rootfilesin: ['skunk']">
   $ hg debugwalk -v 'rootfilesin:beans'
   * matcher:
-  <patternmatcher patterns='(?:beans/[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['beans']">
   f  beans/black     ../beans/black
   f  beans/borlotti  ../beans/borlotti
   f  beans/kidney    ../beans/kidney
   f  beans/navy      ../beans/navy
   f  beans/pinto     ../beans/pinto
   f  beans/turtle    ../beans/turtle
   $ hg debugwalk -v -I 'rootfilesin:beans'
   * matcher:
-  <includematcher includes='(?:beans/[^/]+$)'>
+  <includematcher includes="rootfilesin: ['beans']">
   f  beans/black     ../beans/black
   f  beans/borlotti  ../beans/borlotti
   f  beans/kidney    ../beans/kidney
   f  beans/navy      ../beans/navy
   f  beans/pinto     ../beans/pinto
   f  beans/turtle    ../beans/turtle
   $ hg debugwalk -v 'rootfilesin:mammals'
   * matcher:
-  <patternmatcher patterns='(?:mammals/[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['mammals']">
   f  mammals/skunk  skunk
   $ hg debugwalk -v -I 'rootfilesin:mammals'
   * matcher:
-  <includematcher includes='(?:mammals/[^/]+$)'>
+  <includematcher includes="rootfilesin: ['mammals']">
   f  mammals/skunk  skunk
   $ hg debugwalk -v 'rootfilesin:mammals/'
   * matcher:
-  <patternmatcher patterns='(?:mammals/[^/]+$)'>
+  <patternmatcher patterns="rootfilesin: ['mammals']">
   f  mammals/skunk  skunk
   $ hg debugwalk -v -I 'rootfilesin:mammals/'
   * matcher:
-  <includematcher includes='(?:mammals/[^/]+$)'>
+  <includematcher includes="rootfilesin: ['mammals']">
   f  mammals/skunk  skunk
   $ hg debugwalk -v -X 'rootfilesin:mammals'
   * matcher:
   <differencematcher
     m1=<alwaysmatcher>,
-    m2=<includematcher includes='(?:mammals/[^/]+$)'>>
+    m2=<includematcher includes="rootfilesin: ['mammals']">>
   f  beans/black                     ../beans/black
   f  beans/borlotti                  ../beans/borlotti
   f  beans/kidney                    ../beans/kidney
diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1164,8 +1164,20 @@
 
     regex = ''
     if kindpats:
-        regex, mf = _buildregexmatch(kindpats, globsuffix)
-        matchfuncs.append(mf)
+        if all(k == 'rootfilesin' for k, p, s in kindpats):
+            dirs = {p for k, p, s in kindpats}
+            def mf(f):
+                i = f.rfind('/')
+                if i >= 0:
+                    dir = f[:i]
+                else:
+                    dir = '.'
+                return dir in dirs
+            regex = b'rootfilesin: %s' % sorted(dirs)
+            matchfuncs.append(mf)
+        else:
+            regex, mf = _buildregexmatch(kindpats, globsuffix)
+            matchfuncs.append(mf)
 
     if len(matchfuncs) == 1:
         return regex, matchfuncs[0]



To: martinvonz, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list