[PATCH 07 of 13 STABLE V4] icasefs: retry directory scan once for already invalidated cache

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu Dec 22 00:23:15 CST 2011


At Wed, 21 Dec 2011 18:07:44 -0600,
Matt Mackall wrote:
> 
> On Fri, 2011-12-16 at 21:27 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1324037380 -32400
> > # Branch stable
> > # Node ID c7ba906160bde878957e030d01e9ed44213f3889
> > # Parent  748694149821b3356563a7e12b51f22b1dfdf269
> > icasefs: retry directory scan once for already invalidated cache
> > 
> > some hg operation (e.g.: qpush) create new files after first
> > dirstate.walk()-ing, and it invalidates _fspathcache for fspath().
> 
> Ok, this was very confusing. When you say "invalidates", you don't mean
> "actively destroys the cache to make sure the cache isn't stale" (what
> is usually mean by "invalidate" when talking about a cache), you mean
> "makes the cache incorrect".

Oops, sorry for confusing description.

> +    def find(p, contents):
> +        lenp = len(p)
> +        for n in contents:
> +            if lenp == len(n) and normcase(n) == p:
> +                return n
> +        return None
> 
> Note that we'll eventually have to fix this because normcase can
> actually change the length of filenames. For instance, "ı" (c4 b1) to
> "I" (49) on NTFS and decomposition and escaping issues on HFS+.

I confirmed that creation of file named as "\xc3 \xa0" in UTF-8 can
get decomposed "\x61 \xcc \x80" result of "os.listdir()" on Mac OS
X. So, I think that each elements of os.listdir() result are already
normalized except for case, and length compare is still valid.

But I did not consider normalizing situation like "c4 b1" to "49" on
NTFS.

Should simple compare between "p" and "normcase(n)" without length
check be done ?

By the way, that NTFS behavior seems to depend on language
configuration of NTOS. On my Japanese(= cp932) Windows 7:

    - "\u0131" ("\xc4 \xb1" in UTF-8) created by Unicode API is:

        - normalized into "\x3f" (= "?") in byte API
          (* "no corresponded character" in cp932)
        - still "\u0131" in Unicode API 

    - "\xc4\xb1" created by byte API is:
      (* this byte sequence is valid also in cp932)

        - kept in byte API
        - normalized as "\uff84\uff71" in Unicode API

> > -        if dir not in _fspathcache:
> > -            _fspathcache[dir] = os.listdir(dir)
> > -        contents = _fspathcache[dir]
> > +        contents = _fspathcache.get(dir, None)
> > +        if contents is None:
> > +            contents = os.listdir(dir)
> > +            _fspathcache[dir] = contents
> 
> Not sure what this change is for. Our standard cache lookup pattern is:
> 
> if x not in cache:
>   cache[x] = f(x)
> y = cache[x]
> 
> So long as you don't put bogus None results in the cache, this works
> correctly and more quickly.

Sorry, I'm not so familier Python performance characteristic, so I
thought that my implementation could reduce looking up cost at cache
hit from twice to once (and almost all of penalty at cache-miss was
shared by "os.listdir()").

I'll post patches adjusted to described pattern.

> > -        lenp = len(part)
> > -        for n in contents:
> > -            if lenp == len(n) and normcase(n) == part:
> > -                result.append(n)
> > -                break
> > -        else:
> > -            # Cannot happen, as the file exists!
> > -            result.append(part)
> > +        found = find(part, contents)
> > +        if not found:
> > +            # retry once for the corner case: add files after dir walking
> > +            contents = os.listdir(dir)
> > +            _fspathcache[dir] = contents
> > +            found = find(part, contents)
> 
> This seems wrong: what happens if the case of one or more directory
> components also got changed by mq too?

Sorry for confusing description, again.

I meant that "once per directory" per "patch".

As you described, retry is done in some directory components, if:

  - one patch creates some files in separate directories, and/or
  - patches in which one file is created are pushed

I do not have any other ideas to fix ....

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list