[PATCH] treemanifest: don't iterate entire matching submanifests on match()
Augie Fackler
raf at durin42.com
Mon Dec 14 14:28:37 CST 2015
On Sun, Dec 13, 2015 at 12:38:28AM -0600, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz at google.com>
> # Date 1449943025 28800
> # Sat Dec 12 09:57:05 2015 -0800
> # Node ID f78ef6cfe9d0fd502a219ba0b8e4ff18967f7610
> # Parent 944af8e2eb4cddf96ba5b8a96854528b40979715
> treemanifest: don't iterate entire matching submanifests on match()
Queued this, thanks.
>
> Before 2773540c3650 (match: remove unnecessary optimization where
> visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
> the special value 'all' to indicate that it was known that all
> subdirectories would also be included in the match. The purpose for
> that value was to avoid calling the matcher on all the paths. It
> turned out that calling the matcher was not a problem, so the special
> return value was removed and the code was simplified. However, if we
> use the same special value for not just avoiding calling the matcher
> on each file, but to avoid iterating over each file, it's a much
> bigger win. On commands like
>
> hg st --rev .^ --rev . dom/
>
> we run the matcher (dom/) on the two manifests, then diff the narrowed
> manifest. If the size of the match is much larger than the size of the
> diff, this is wasteful. In the above case, we would end up iterating
> over the 15k-or-so files in dom/ for each of the manifests, only to
> later discover that they are mostly the same. This means that runningt
> the command above is usually slower than getting the status for the
> entire repo, because that code avoids calling treemanifest.match() and
> only calls treemanifest.diff(), which loads only what's needed for the
> diff.
>
> Let's fix this by reintroducing the 'all' value in match.visitdir()
> and making treemanifest.match() return a lazy copy of the manifest
> from dom/ and down (in the above case). This speeds up the above
> command on the Firefox repo from 0.357s to 0.137s (best of 5). The
> wider the match, the bigger the speedup.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -740,9 +740,12 @@
> def _matches(self, match):
> '''recursively generate a new manifest filtered by the match argument.
> '''
> +
> + visit = match.visitdir(self._dir[:-1] or '.')
> + if visit == 'all':
> + return self.copy()
> ret = treemanifest(self._dir)
> -
> - if not match.visitdir(self._dir[:-1] or '.'):
> + if not visit:
> return ret
>
> self._load()
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -227,9 +227,15 @@
> has potential matches in it or one of its subdirectories. This is
> based on the match's primary, included, and excluded patterns.
>
> + Returns the string 'all' if the given directory and all subdirectories
> + should be visited. Otherwise returns True or False indicating whether
> + the given directory should be visited.
> +
> This function's behavior is undefined if it has returned False for
> one of the dir's parent directories.
> '''
> + if self.prefix() and dir in self._fileroots:
> + return 'all'
> if dir in self._excluderoots:
> return False
> if (self._includeroots and
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list