Performance regression on case-insensitive filesystems (Windows)

Wed Jul 18 03:48:46 CDT 2012

At Wed, 18 Jul 2012 01:00:12 +0200,
Martin Geisler wrote:
> 
> Hi guys,
> 
> I'm looking into why Mercurial is slow on a repository with 75k tracked
> files and 47k ignored files. With a warm cache, I get
> 
>   C:\_b\R054>hg status --time
>   time: real 4.418 secs (user 2.266+0.000 sys 2.031+0.000)
> 
> with 67b8cca2f12b (tip of default branch). With Mercurial 2.0 I get
> 
>   C:\_b\R054>hg status --time
>   Time: real 3.006 secs (user 1.250+0.000 sys 1.672+0.000)
> 
> The time jumps at 2ebe3d0ce91d, which introduced more careful case
> folding logic:
> 
>   http://selenic.com/hg/rev/2ebe3d0ce91d

Oops, my change triggered performance regression !

> A profile of default tip looks like:
> 
>   http://pastebin.com/PKaqUr1p
> 
> The tests are done on a Windows XP machine with a SSD.
> 
> I tried to cut down the number of calls to encoding.upper, since that
> shows up in the profile with a 1.7 sec of total time. So I changed
> dirstate._foldmap so that it would call util.normcase once on the
> filenames joined with \0. However, that only removed half of the calls
> and the runtime stayed more or less the same.
> 
> The other half of the util.normcase calls comes from
> dirstate._normalize, which unconditionally starts by normalizing its
> path argument to look it up in the foldmap. The _normalize method is
> called in dirstate.walk and I didn't figure out a good way to combine
> those calls.

I also tried to improve performance around "dirstate._normalize()" in
"dirstate.walk()".

"dirstate.walk()" causes "dirstate._normalize()" invocation in three
cases below:

  (1) from "# step 1: find all explicit files"

        for ff in files:
            nf = normalize(normpath(ff), False, True)

  (2) from "# step 2: visit subdirectories"

        for f, kind, st in entries:
            nf = normalize(nd and (nd + "/" + f) or f, True, True)

  (3) from "dirstate._normalize()" recursively

        d, f = normed.rsplit('/', 1)
        d = self._normalize(d, isknown, ignoremissing, True)

At first, in (3) case, "d" from "normed" is already normalized by
"util.normcase()", so normalization again in recursively called
"dirstate._normalize()" is obviously redundant.

Adding "normed" optional argument, which holds already normalized
"path" or None, can avoid redundant "util.normcase()" invocation in
this case.

    ========================================
    @@ -414,8 +414,10 @@
                 self._droppath(f)
                 del self._map[f]

    -    def _normalize(self, path, isknown, ignoremissing=False, exists=None):
    -        normed = util.normcase(path)
    +    def _normalize(self, path, isknown, ignoremissing=False, exists=None,
    +                   normed=None):
    +        if not normed:
    +            normed = util.normcase(path)
             folded = self._foldmap.get(normed, None)
             if folded is None:
                 if isknown:
    @@ -427,7 +429,9 @@
                         # Maybe a path component exists
                         if not ignoremissing and '/' in path:
                             d, f = path.rsplit('/', 1)
    -                        d = self._normalize(d, isknown, ignoremissing, None)
    +                        nd, nf = normed.rsplit('/', 1)
    +                        d = self._normalize(d, isknown, ignoremissing, None,
    +                                            normed=nd)
                             folded = d + "/" + f
                         else:
                             # No path components, preserve original case
    @@ -437,7 +441,8 @@
                         # against dirstate
                         if '/' in normed:
                             d, f = normed.rsplit('/', 1)
    -                        d = self._normalize(d, isknown, ignoremissing, True)
    +                        d = self._normalize(d, isknown, ignoremissing, True,
    +                                            normed=d)
                             r = self._root + "/" + d
                             folded = d + "/" + util.fspath(f, r)
                         else:
    ========================================

Then, adding method below to "dirstate" seems to reduce cost of
"util.normalize()" invocation for many files at a time:

    ========================================
    @@ -451,6 +451,16 @@

             return folded

    +    def _normalizefiles(self, files, isknown, ignoremissing, normpath=None):
    +        if normpath:
    +            joined = '\0'.join([normpath(f) for f in files])
    +        else:
    +            joined = '\0'.join(files)
    +        normedfiles = util.normcase(joined)
    +        for file, normed in zip(files, normedfiles.split('\0')):
    +            yield file, self._normalize(file, isknown, ignoremissing,
    +                                        normed=normed)
    +
         def normalize(self, path, isknown=False, ignoremissing=False):
             '''
             normalize the case of a pathname when on a casefolding filesystem
    ========================================

And "normalizefiles" in "dirstate.walk()" is initialized in the way below:

    ========================================
    @@ -609,9 +619,11 @@

             if not exact and self._checkcase:
                 normalize = self._normalize
    +            normalizefiles = self._normalizefiles
                 skipstep3 = False
             else:
                 normalize = lambda x, y, z: x
    +            normalizefiles = lambda x, y, z, f=None: f and [f(e) for e in x] or x

             files = sorted(match.files())
             subrepos.sort()
    ========================================

In (1) case, this can be used as:

    ========================================
    @@ -631,8 +643,7 @@
             results['.hg'] = None

             # step 1: find all explicit files
    -        for ff in files:
    -            nf = normalize(normpath(ff), False, True)
    +        for ff, nf in normalizefiles(files, False, True, normpath):
                 if nf in results:
                     continue
    ========================================

And in (2) case:

    ========================================
    @@ -672,8 +683,10 @@
                 skip = None
                 if nd == '.':
                     nd = ''
    +                ndjoin = lambda f: f
                 else:
                     skip = '.hg'
    +                ndjoin = lambda f: nd + "/" + f
                 try:
                     entries = listdir(join(nd), stat=True, skip=skip)
                 except OSError, inst:
    @@ -681,8 +694,9 @@
                         fwarn(nd, inst.strerror)
                         continue
                     raise
    -            for f, kind, st in entries:
    -                nf = normalize(nd and (nd + "/" + f) or f, True, True)
    +            files = [ndjoin(f) for f, kind, st in entries]
    +            for (f, kind, st), (ff, nf) in zip(entries,
    +                                               normalizefiles(files, True, True)):
                     if nf not in results:
                         if kind == dirkind:
                             if not ignore(nf):
    ========================================

Here, performance checking results with repo containing 4.4K files on
Windows7 + HDD are:

  (A) hg 2.0:

      time: real 3.428 secs (user 2.527+0.000 sys 0.905+0.000)

  (B) 67b8cca2f12b(default) + "normcase()" for files stored into
      foldmap at a time:

      time: real 4.174 secs (user 3.151+0.000 sys 0.998+0.000)

  (C) (B) + (1) + (2) + (3)

      time: real 4.008 secs (user 3.120+0.000 sys 0.889+0.000)

Hummm, I can get a little improvement from (B), but not so enough.

BTW, should "itertools.izip" or same kind utility from scratch be used
instead of standard "zip()" for efficiency in resource consumption ?
And does avoiding large array creation also contribute performance
improvement ?

I don't know about well balanced points between performance and
resource consumption in Python programming.

> Combining the calls ought to pay off: I saved all 75k filenames in a
> file and timed how long it takes to decode, uppercase, and encode them:
> 
> C:\_b\R054>python -m timeit -s "m = open('m.txt').read()"
>                                "m.decode('cp1252').upper().encode('cp1252')"
> 10 loops, best of 3: 173 msec per loop
> 
> That's roughtly what encoding.upper does on a per-file basis right now.
> The manifest file is 7.4 MB, so decoding large amounts of text is pretty
> fast when you do it in bulk.
> 
> Finally, making util.checkcase always return True (to indicate a case
> sensitive filesystem), brings the time down to just
> 
>   C:\_b\R054>hg status --time
>   time: real 2.753 secs (user 1.312+0.000 sys 1.359+0.000)
>
> I would be interested in hearing if anybody has some good ideas for
> speeding up status again!

Then, I found that "windows.statfiles()" costs much more than ones of
2.0 in profile result.

with hg 2.0:

   CallCount Recursive  Total(ms) Inline(ms) module:lineno(function)
       44097   0        3.0398    0.1703     mercurial.windows:214(statfiles)

But with improved version (C):

       44097   0        3.5576    0.1831     mercurial.windows:219(statfiles)

7f01ad702405 (is also mine !) causes this performance regression: this
change uses "windows.normcase()" instead of "os.path.normpath()" in
"windows.py", because "os.path.normcase()"(just lowering) are not good
for some encoding.

And I tried to "normcase()" files in bulk in "windows.statfiles()"
like below:

    ========================================
    @@ -220,16 +220,16 @@
         '''Stat each file in files and yield stat or None if file does not exist.
         Cluster and cache stat per directory to minimize number of OS stat calls.'''
         dircache = {} # dirname -> filename -> status | None if file does not exist
    -    for nf in files:
    -        nf  = normcase(nf)
    +    for nf in normcase('\0'.join(files)).split('\0'):
             dir, base = os.path.split(nf)
             if not dir:
                 dir = '.'
             cache = dircache.get(dir, None)
             if cache is None:
                 try:
    -                dmap = dict([(normcase(n), s)
    -                    for n, k, s in osutil.listdir(dir, True)])
    +                ld = osutil.listdir(dir, True)
    +                nns = normcase('\0'.join([n for n, k, s in ld])).split('\0')
    +                dmap = dict([(nn, s) for (n, k, s), nn in zip(ld, nns)])
                 except OSError, err:
                     # handle directory not found in Python version prior to 2.5
                     # Python <= 2.4 returns native Windows code 3 in errno
    ========================================

After this change, performance checking results are:

  (A) hg 2.0:

      time: real 3.428 secs (user 2.527+0.000 sys 0.905+0.000)

  (B) 67b8cca2f12b(default) + "normcase()" for files stored into
      foldmap at a time:

      time: real 4.174 secs (user 3.151+0.000 sys 0.998+0.000)

  (C) (B) + (1) + (2) + (3):

      time: real 4.008 secs (user 3.120+0.000 sys 0.889+0.000)

  (D) (C) + improvement in "windows.statfiles()":

      time: real 3.740 secs (user 2.839+0.000 sys 0.905+0.000)

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp