[PATCH 07 of 13 STABLE V4] icasefs: retry directory scan once for already invalidated cache
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Thu Dec 22 00:23:15 CST 2011
At Wed, 21 Dec 2011 18:07:44 -0600,
Matt Mackall wrote:
>
> On Fri, 2011-12-16 at 21:27 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1324037380 -32400
> > # Branch stable
> > # Node ID c7ba906160bde878957e030d01e9ed44213f3889
> > # Parent 748694149821b3356563a7e12b51f22b1dfdf269
> > icasefs: retry directory scan once for already invalidated cache
> >
> > some hg operation (e.g.: qpush) create new files after first
> > dirstate.walk()-ing, and it invalidates _fspathcache for fspath().
>
> Ok, this was very confusing. When you say "invalidates", you don't mean
> "actively destroys the cache to make sure the cache isn't stale" (what
> is usually mean by "invalidate" when talking about a cache), you mean
> "makes the cache incorrect".
Oops, sorry for confusing description.
> + def find(p, contents):
> + lenp = len(p)
> + for n in contents:
> + if lenp == len(n) and normcase(n) == p:
> + return n
> + return None
>
> Note that we'll eventually have to fix this because normcase can
> actually change the length of filenames. For instance, "ı" (c4 b1) to
> "I" (49) on NTFS and decomposition and escaping issues on HFS+.
I confirmed that creation of file named as "\xc3 \xa0" in UTF-8 can
get decomposed "\x61 \xcc \x80" result of "os.listdir()" on Mac OS
X. So, I think that each elements of os.listdir() result are already
normalized except for case, and length compare is still valid.
But I did not consider normalizing situation like "c4 b1" to "49" on
NTFS.
Should simple compare between "p" and "normcase(n)" without length
check be done ?
By the way, that NTFS behavior seems to depend on language
configuration of NTOS. On my Japanese(= cp932) Windows 7:
- "\u0131" ("\xc4 \xb1" in UTF-8) created by Unicode API is:
- normalized into "\x3f" (= "?") in byte API
(* "no corresponded character" in cp932)
- still "\u0131" in Unicode API
- "\xc4\xb1" created by byte API is:
(* this byte sequence is valid also in cp932)
- kept in byte API
- normalized as "\uff84\uff71" in Unicode API
> > - if dir not in _fspathcache:
> > - _fspathcache[dir] = os.listdir(dir)
> > - contents = _fspathcache[dir]
> > + contents = _fspathcache.get(dir, None)
> > + if contents is None:
> > + contents = os.listdir(dir)
> > + _fspathcache[dir] = contents
>
> Not sure what this change is for. Our standard cache lookup pattern is:
>
> if x not in cache:
> cache[x] = f(x)
> y = cache[x]
>
> So long as you don't put bogus None results in the cache, this works
> correctly and more quickly.
Sorry, I'm not so familier Python performance characteristic, so I
thought that my implementation could reduce looking up cost at cache
hit from twice to once (and almost all of penalty at cache-miss was
shared by "os.listdir()").
I'll post patches adjusted to described pattern.
> > - lenp = len(part)
> > - for n in contents:
> > - if lenp == len(n) and normcase(n) == part:
> > - result.append(n)
> > - break
> > - else:
> > - # Cannot happen, as the file exists!
> > - result.append(part)
> > + found = find(part, contents)
> > + if not found:
> > + # retry once for the corner case: add files after dir walking
> > + contents = os.listdir(dir)
> > + _fspathcache[dir] = contents
> > + found = find(part, contents)
>
> This seems wrong: what happens if the case of one or more directory
> components also got changed by mq too?
Sorry for confusing description, again.
I meant that "once per directory" per "patch".
As you described, retry is done in some directory components, if:
- one patch creates some files in separate directories, and/or
- patches in which one file is created are pushed
I do not have any other ideas to fix ....
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list