Performance regression on case-insensitive filesystems (Windows)
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Wed Jul 18 03:48:46 CDT 2012
At Wed, 18 Jul 2012 01:00:12 +0200,
Martin Geisler wrote:
>
> Hi guys,
>
> I'm looking into why Mercurial is slow on a repository with 75k tracked
> files and 47k ignored files. With a warm cache, I get
>
> C:\_b\R054>hg status --time
> time: real 4.418 secs (user 2.266+0.000 sys 2.031+0.000)
>
> with 67b8cca2f12b (tip of default branch). With Mercurial 2.0 I get
>
> C:\_b\R054>hg status --time
> Time: real 3.006 secs (user 1.250+0.000 sys 1.672+0.000)
>
> The time jumps at 2ebe3d0ce91d, which introduced more careful case
> folding logic:
>
> http://selenic.com/hg/rev/2ebe3d0ce91d
Oops, my change triggered performance regression !
> A profile of default tip looks like:
>
> http://pastebin.com/PKaqUr1p
>
> The tests are done on a Windows XP machine with a SSD.
>
> I tried to cut down the number of calls to encoding.upper, since that
> shows up in the profile with a 1.7 sec of total time. So I changed
> dirstate._foldmap so that it would call util.normcase once on the
> filenames joined with \0. However, that only removed half of the calls
> and the runtime stayed more or less the same.
>
> The other half of the util.normcase calls comes from
> dirstate._normalize, which unconditionally starts by normalizing its
> path argument to look it up in the foldmap. The _normalize method is
> called in dirstate.walk and I didn't figure out a good way to combine
> those calls.
I also tried to improve performance around "dirstate._normalize()" in
"dirstate.walk()".
"dirstate.walk()" causes "dirstate._normalize()" invocation in three
cases below:
(1) from "# step 1: find all explicit files"
for ff in files:
nf = normalize(normpath(ff), False, True)
(2) from "# step 2: visit subdirectories"
for f, kind, st in entries:
nf = normalize(nd and (nd + "/" + f) or f, True, True)
(3) from "dirstate._normalize()" recursively
d, f = normed.rsplit('/', 1)
d = self._normalize(d, isknown, ignoremissing, True)
At first, in (3) case, "d" from "normed" is already normalized by
"util.normcase()", so normalization again in recursively called
"dirstate._normalize()" is obviously redundant.
Adding "normed" optional argument, which holds already normalized
"path" or None, can avoid redundant "util.normcase()" invocation in
this case.
========================================
@@ -414,8 +414,10 @@
self._droppath(f)
del self._map[f]
- def _normalize(self, path, isknown, ignoremissing=False, exists=None):
- normed = util.normcase(path)
+ def _normalize(self, path, isknown, ignoremissing=False, exists=None,
+ normed=None):
+ if not normed:
+ normed = util.normcase(path)
folded = self._foldmap.get(normed, None)
if folded is None:
if isknown:
@@ -427,7 +429,9 @@
# Maybe a path component exists
if not ignoremissing and '/' in path:
d, f = path.rsplit('/', 1)
- d = self._normalize(d, isknown, ignoremissing, None)
+ nd, nf = normed.rsplit('/', 1)
+ d = self._normalize(d, isknown, ignoremissing, None,
+ normed=nd)
folded = d + "/" + f
else:
# No path components, preserve original case
@@ -437,7 +441,8 @@
# against dirstate
if '/' in normed:
d, f = normed.rsplit('/', 1)
- d = self._normalize(d, isknown, ignoremissing, True)
+ d = self._normalize(d, isknown, ignoremissing, True,
+ normed=d)
r = self._root + "/" + d
folded = d + "/" + util.fspath(f, r)
else:
========================================
Then, adding method below to "dirstate" seems to reduce cost of
"util.normalize()" invocation for many files at a time:
========================================
@@ -451,6 +451,16 @@
return folded
+ def _normalizefiles(self, files, isknown, ignoremissing, normpath=None):
+ if normpath:
+ joined = '\0'.join([normpath(f) for f in files])
+ else:
+ joined = '\0'.join(files)
+ normedfiles = util.normcase(joined)
+ for file, normed in zip(files, normedfiles.split('\0')):
+ yield file, self._normalize(file, isknown, ignoremissing,
+ normed=normed)
+
def normalize(self, path, isknown=False, ignoremissing=False):
'''
normalize the case of a pathname when on a casefolding filesystem
========================================
And "normalizefiles" in "dirstate.walk()" is initialized in the way below:
========================================
@@ -609,9 +619,11 @@
if not exact and self._checkcase:
normalize = self._normalize
+ normalizefiles = self._normalizefiles
skipstep3 = False
else:
normalize = lambda x, y, z: x
+ normalizefiles = lambda x, y, z, f=None: f and [f(e) for e in x] or x
files = sorted(match.files())
subrepos.sort()
========================================
In (1) case, this can be used as:
========================================
@@ -631,8 +643,7 @@
results['.hg'] = None
# step 1: find all explicit files
- for ff in files:
- nf = normalize(normpath(ff), False, True)
+ for ff, nf in normalizefiles(files, False, True, normpath):
if nf in results:
continue
========================================
And in (2) case:
========================================
@@ -672,8 +683,10 @@
skip = None
if nd == '.':
nd = ''
+ ndjoin = lambda f: f
else:
skip = '.hg'
+ ndjoin = lambda f: nd + "/" + f
try:
entries = listdir(join(nd), stat=True, skip=skip)
except OSError, inst:
@@ -681,8 +694,9 @@
fwarn(nd, inst.strerror)
continue
raise
- for f, kind, st in entries:
- nf = normalize(nd and (nd + "/" + f) or f, True, True)
+ files = [ndjoin(f) for f, kind, st in entries]
+ for (f, kind, st), (ff, nf) in zip(entries,
+ normalizefiles(files, True, True)):
if nf not in results:
if kind == dirkind:
if not ignore(nf):
========================================
Here, performance checking results with repo containing 4.4K files on
Windows7 + HDD are:
(A) hg 2.0:
time: real 3.428 secs (user 2.527+0.000 sys 0.905+0.000)
(B) 67b8cca2f12b(default) + "normcase()" for files stored into
foldmap at a time:
time: real 4.174 secs (user 3.151+0.000 sys 0.998+0.000)
(C) (B) + (1) + (2) + (3)
time: real 4.008 secs (user 3.120+0.000 sys 0.889+0.000)
Hummm, I can get a little improvement from (B), but not so enough.
BTW, should "itertools.izip" or same kind utility from scratch be used
instead of standard "zip()" for efficiency in resource consumption ?
And does avoiding large array creation also contribute performance
improvement ?
I don't know about well balanced points between performance and
resource consumption in Python programming.
> Combining the calls ought to pay off: I saved all 75k filenames in a
> file and timed how long it takes to decode, uppercase, and encode them:
>
> C:\_b\R054>python -m timeit -s "m = open('m.txt').read()"
> "m.decode('cp1252').upper().encode('cp1252')"
> 10 loops, best of 3: 173 msec per loop
>
> That's roughtly what encoding.upper does on a per-file basis right now.
> The manifest file is 7.4 MB, so decoding large amounts of text is pretty
> fast when you do it in bulk.
>
> Finally, making util.checkcase always return True (to indicate a case
> sensitive filesystem), brings the time down to just
>
> C:\_b\R054>hg status --time
> time: real 2.753 secs (user 1.312+0.000 sys 1.359+0.000)
>
> I would be interested in hearing if anybody has some good ideas for
> speeding up status again!
Then, I found that "windows.statfiles()" costs much more than ones of
2.0 in profile result.
with hg 2.0:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
44097 0 3.0398 0.1703 mercurial.windows:214(statfiles)
But with improved version (C):
44097 0 3.5576 0.1831 mercurial.windows:219(statfiles)
7f01ad702405 (is also mine !) causes this performance regression: this
change uses "windows.normcase()" instead of "os.path.normpath()" in
"windows.py", because "os.path.normcase()"(just lowering) are not good
for some encoding.
And I tried to "normcase()" files in bulk in "windows.statfiles()"
like below:
========================================
@@ -220,16 +220,16 @@
'''Stat each file in files and yield stat or None if file does not exist.
Cluster and cache stat per directory to minimize number of OS stat calls.'''
dircache = {} # dirname -> filename -> status | None if file does not exist
- for nf in files:
- nf = normcase(nf)
+ for nf in normcase('\0'.join(files)).split('\0'):
dir, base = os.path.split(nf)
if not dir:
dir = '.'
cache = dircache.get(dir, None)
if cache is None:
try:
- dmap = dict([(normcase(n), s)
- for n, k, s in osutil.listdir(dir, True)])
+ ld = osutil.listdir(dir, True)
+ nns = normcase('\0'.join([n for n, k, s in ld])).split('\0')
+ dmap = dict([(nn, s) for (n, k, s), nn in zip(ld, nns)])
except OSError, err:
# handle directory not found in Python version prior to 2.5
# Python <= 2.4 returns native Windows code 3 in errno
========================================
After this change, performance checking results are:
(A) hg 2.0:
time: real 3.428 secs (user 2.527+0.000 sys 0.905+0.000)
(B) 67b8cca2f12b(default) + "normcase()" for files stored into
foldmap at a time:
time: real 4.174 secs (user 3.151+0.000 sys 0.998+0.000)
(C) (B) + (1) + (2) + (3):
time: real 4.008 secs (user 3.120+0.000 sys 0.889+0.000)
(D) (C) + improvement in "windows.statfiles()":
time: real 3.740 secs (user 2.839+0.000 sys 0.905+0.000)
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list