[PATCH STABLE V2] i18n: fix case folding problem with problematic encodings
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Fri Dec 2 10:39:20 CST 2011
At Thu, 01 Dec 2011 10:55:11 -0600,
Matt Mackall wrote:
> > As I understand it:
> >
> > "util.normcase()" should abstract case folding policy, so
> > normcase-ed result should not be expected to be either lower or
> > upper.
> >
> >
> > Then, I categorize lower/upper-ing points in current implementation.
> >
> > A. compare between filenames (directly or in-directly)
> >
> > "util.normcase()" should be applied on them.
>
> util.normcase should only be applied after we've determined that we're
> on a case-insensitive filesystem. We've done a pretty good job of
> restricting its usage to dirstate.py, which carefully caches all the
> relevant bits with _foldmap and normalize.
>
> You should spend a while understanding dirstate.normalize. It's not
> enough to be case-insensitive, we also have to be case-preserving.
As I understand "dirstate._foldmap" functionality:
- dirstate stores up "case-preserved" names
(* "logical name")
- "util.normcase()" should emulate case folding policy in target
case-insensitive filesystem
case folding policy depends on filesystem implementation, so we
should not expect either lower-ed or upper-ed.
- "dirstate._foldmap" maps from "util.normcase()"-ed (= case folded)
names to "case-preserved" ones
- if it is already tracked, this mapping gives original
case-preserved name (= "logical name")
- otherwise, "case-preserved" name is given from filesystem
layer: case information is preserved in many case-insensitive
filesystem
then, given name may be stored into dirstate, and become
"logical name"
Almost all filename compare should be applied on "logical name" (and
any of lower/upper can be available), so "util.normcase()" can be
limited in "dirstate.normalize" family, as you described in above
reply.
Do I understand correctly ?
Then, I picked up some filename comparing points where I have less
confidence in my inference whether "util.normcase()" is needed or not.
# especially (3) and (4) !!
1. encoding.lower() in merge._checkcollision():
2. encoding.lower() in scmutil.casecollisionauditor:
this checks collision between "logical name"s, so
"util.normcase()" is not needed
3. os.path.normcase() in scmutil.pathauditor.__call__():
4. os.path.normcase() in util.fspath():
these "os.path.normcase()"-ed strings are used to invoke
"os.lstat()" or "os.listdir()".
so, just lower/upper-ed name may cause on some case-insensitive
filesystems, but os.path.normcase() is not suitable, too.
"util.fspath()" is used only on case-insensitive filesystem, so
"util.normcase()" may be reasonable. but "scmutil.pathauditor" is
used also on case-sensitive filesystem.
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list