[PATCH 4 of 4] dirstate: speed up case collision checking

Bryan O'Sullivan bos at serpentine.com
Fri Jun 22 16:41:08 CDT 2012


On Fri, Jun 22, 2012 at 4:49 PM, Joshua Redstone <joshua.redstone at fb.com>wrote:

>
> scmutil.casecollisionauditor is slow for large repos.  Switch to a faster
> implementation based on reducing the fraction of all files that are scanned
> and caching the results of case conversion lazily rather than precomputing.
>

Nice idea. Do you happen to know how much difference it makes?

-        cca = scmutil.casecollisionauditor(ui, abort, wctx)
> +        cca = scmutil.casecollisionauditor(repo.dirstate.sortedfiles())
>

Ah, so this is where sortedfiles was coming from in that previous patch.

I'm not sure it helps to have both sortedfiles and _sortedfiles. Why not
just have one here? Just following the pattern used for _dirs and dirs?

+++ b/mercurial/scmutil.py
> +    def __init__(self, sortedfiles):
> +        self._lowerfiles = set()
> +        self._lowereddirs = set()
> +        self._sortedfiles = sortedfiles
>

One small naming nit: we have lower_ed_dirs, sort_ed_files, but lowerfiles
instead of lower_ed_files. It would be nice to have the names consistent
unless they do different things.

+    def collides(self, path):
> +        """
> +        Check if path has a case collision with another file, either
> +        one we've checked before or one in a sorted list of files.
> +        Returns true iff there is a collision.
> +        """
> +        dirn = os.path.dirname(path)
>

There's some confusion about path separators in this code.

I believe that the paths you're manipulating here come from the dirstate
and are internal to Mercurial, which always uses '/' as the separator. But
I may be wrong about this.

With a few exceptions (stuff in the system's genericpath.py), you can't use
most os.path functions on Mercurial's internal names. os.path.dirname will
definitely give wrong answers for Mercurial's internal paths on Windows,
for instance.

+        dirlevel = path.count('/')
>

But if I'm wrong about the origin of these names and they're actually
filesystem names, then you should be counting os.sep here, not '/', because
'/' will not be the separator on Windows.


> +            prefixdirslash = os.path.dirname(prefix) + "/"
> +            if prefixdirslash == "/":
> +                prefixdirslash = ""
>

Another occasion to be careful with naming.


> +                if f.count('/') != dirlevel:
> +                    continue
>

And another.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120622/e884b3f3/attachment.html>


More information about the Mercurial-devel mailing list