[PATCH 4 of 4] dirstate: speed up case collision checking

Joshua Redstone joshua.redstone at fb.com
Mon Jun 25 08:56:51 CDT 2012


On 06/22/2012 05:41 PM, Bryan O'Sullivan wrote:
> On Fri, Jun 22, 2012 at 4:49 PM, Joshua Redstone 
> <joshua.redstone at fb.com <mailto:joshua.redstone at fb.com>> wrote:
>
>
>     scmutil.casecollisionauditor is slow for large repos.  Switch to a
>     faster
>     implementation based on reducing the fraction of all files that
>     are scanned
>     and caching the results of case conversion lazily rather than
>     precomputing.
>
>
> Nice idea. Do you happen to know how much difference it makes?
The two optimizations in this patch series reduce add time by over 50%.  
Before the series, time in add was split roughly equally between this 
case collision code and the construction of _dirs.

>
>     -        cca = scmutil.casecollisionauditor(ui, abort, wctx)
>     +        cca =
>     scmutil.casecollisionauditor(repo.dirstate.sortedfiles())
>
>
> Ah, so this is where sortedfiles was coming from in that previous patch.
>
> I'm not sure it helps to have both sortedfiles and _sortedfiles. Why 
> not just have one here? Just following the pattern used for _dirs and 
> dirs?
>
I was following the pattern used for dirs().  I can get rid of sortedfiles()

> +    def collides(self, path):
>
>     +        """
>     +        Check if path has a case collision with another file, either
>     +        one we've checked before or one in a sorted list of files.
>     +        Returns true iff there is a collision.
>     +        """
>     +        dirn = os.path.dirname(path)
>
>
> There's some confusion about path separators in this code.
>
> I believe that the paths you're manipulating here come from the 
> dirstate and are internal to Mercurial, which always uses '/' as the 
> separator. But I may be wrong about this.
>
> With a few exceptions (stuff in the system's genericpath.py), you 
> can't use most os.path functions on Mercurial's internal names. 
> os.path.dirname will definitely give wrong answers for Mercurial's 
> internal paths on Windows, for instance.
>
>     +        dirlevel = path.count('/')
>
>
> But if I'm wrong about the origin of these names and they're actually 
> filesystem names, then you should be counting os.sep here, not '/', 
> because '/' will not be the separator on Windows.

Given that dirstate._finddirs is reasoning about '/' as a separator, I 
presume something converts to '/' internally.  You're probably still 
right about os.path routines not being portable, though.  I'll get rid 
of them.
>
>     +            prefixdirslash = os.path.dirname(prefix) + "/"
>     +            if prefixdirslash == "/":
>     +                prefixdirslash = ""
>
>
> Another occasion to be careful with naming.
>
>     +                if f.count('/') != dirlevel:
>     +                    continue
>
>
> And another.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120625/e640ace9/attachment.html>


More information about the Mercurial-devel mailing list