[PATCH 4 of 4] dirstate: speed up case collision checking
Joshua Redstone
joshua.redstone at fb.com
Mon Jun 25 08:56:51 CDT 2012
On 06/22/2012 05:41 PM, Bryan O'Sullivan wrote:
> On Fri, Jun 22, 2012 at 4:49 PM, Joshua Redstone
> <joshua.redstone at fb.com <mailto:joshua.redstone at fb.com>> wrote:
>
>
> scmutil.casecollisionauditor is slow for large repos. Switch to a
> faster
> implementation based on reducing the fraction of all files that
> are scanned
> and caching the results of case conversion lazily rather than
> precomputing.
>
>
> Nice idea. Do you happen to know how much difference it makes?
The two optimizations in this patch series reduce add time by over 50%.
Before the series, time in add was split roughly equally between this
case collision code and the construction of _dirs.
>
> - cca = scmutil.casecollisionauditor(ui, abort, wctx)
> + cca =
> scmutil.casecollisionauditor(repo.dirstate.sortedfiles())
>
>
> Ah, so this is where sortedfiles was coming from in that previous patch.
>
> I'm not sure it helps to have both sortedfiles and _sortedfiles. Why
> not just have one here? Just following the pattern used for _dirs and
> dirs?
>
I was following the pattern used for dirs(). I can get rid of sortedfiles()
> + def collides(self, path):
>
> + """
> + Check if path has a case collision with another file, either
> + one we've checked before or one in a sorted list of files.
> + Returns true iff there is a collision.
> + """
> + dirn = os.path.dirname(path)
>
>
> There's some confusion about path separators in this code.
>
> I believe that the paths you're manipulating here come from the
> dirstate and are internal to Mercurial, which always uses '/' as the
> separator. But I may be wrong about this.
>
> With a few exceptions (stuff in the system's genericpath.py), you
> can't use most os.path functions on Mercurial's internal names.
> os.path.dirname will definitely give wrong answers for Mercurial's
> internal paths on Windows, for instance.
>
> + dirlevel = path.count('/')
>
>
> But if I'm wrong about the origin of these names and they're actually
> filesystem names, then you should be counting os.sep here, not '/',
> because '/' will not be the separator on Windows.
Given that dirstate._finddirs is reasoning about '/' as a separator, I
presume something converts to '/' internally. You're probably still
right about os.path routines not being portable, though. I'll get rid
of them.
>
> + prefixdirslash = os.path.dirname(prefix) + "/"
> + if prefixdirslash == "/":
> + prefixdirslash = ""
>
>
> Another occasion to be careful with naming.
>
> + if f.count('/') != dirlevel:
> + continue
>
>
> And another.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120625/e640ace9/attachment.html>
More information about the Mercurial-devel
mailing list