[PATCH] treemanifest: don't iterate entire matching submanifests on match()

Augie Fackler raf at durin42.com
Mon Dec 14 14:28:37 CST 2015


On Sun, Dec 13, 2015 at 12:38:28AM -0600, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz at google.com>
> # Date 1449943025 28800
> #      Sat Dec 12 09:57:05 2015 -0800
> # Node ID f78ef6cfe9d0fd502a219ba0b8e4ff18967f7610
> # Parent  944af8e2eb4cddf96ba5b8a96854528b40979715
> treemanifest: don't iterate entire matching submanifests on match()

Queued this, thanks.

>
> Before 2773540c3650 (match: remove unnecessary optimization where
> visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
> the special value 'all' to indicate that it was known that all
> subdirectories would also be included in the match. The purpose for
> that value was to avoid calling the matcher on all the paths. It
> turned out that calling the matcher was not a problem, so the special
> return value was removed and the code was simplified. However, if we
> use the same special value for not just avoiding calling the matcher
> on each file, but to avoid iterating over each file, it's a much
> bigger win. On commands like
>
>   hg st --rev .^ --rev . dom/
>
> we run the matcher (dom/) on the two manifests, then diff the narrowed
> manifest. If the size of the match is much larger than the size of the
> diff, this is wasteful. In the above case, we would end up iterating
> over the 15k-or-so files in dom/ for each of the manifests, only to
> later discover that they are mostly the same. This means that runningt
> the command above is usually slower than getting the status for the
> entire repo, because that code avoids calling treemanifest.match() and
> only calls treemanifest.diff(), which loads only what's needed for the
> diff.
>
> Let's fix this by reintroducing the 'all' value in match.visitdir()
> and making treemanifest.match() return a lazy copy of the manifest
> from dom/ and down (in the above case). This speeds up the above
> command on the Firefox repo from 0.357s to 0.137s (best of 5). The
> wider the match, the bigger the speedup.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -740,9 +740,12 @@
>      def _matches(self, match):
>          '''recursively generate a new manifest filtered by the match argument.
>          '''
> +
> +        visit = match.visitdir(self._dir[:-1] or '.')
> +        if visit == 'all':
> +            return self.copy()
>          ret = treemanifest(self._dir)
> -
> -        if not match.visitdir(self._dir[:-1] or '.'):
> +        if not visit:
>              return ret
>
>          self._load()
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -227,9 +227,15 @@
>          has potential matches in it or one of its subdirectories. This is
>          based on the match's primary, included, and excluded patterns.
>
> +        Returns the string 'all' if the given directory and all subdirectories
> +        should be visited. Otherwise returns True or False indicating whether
> +        the given directory should be visited.
> +
>          This function's behavior is undefined if it has returned False for
>          one of the dir's parent directories.
>          '''
> +        if self.prefix() and dir in self._fileroots:
> +            return 'all'
>          if dir in self._excluderoots:
>              return False
>          if (self._includeroots and
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list