[PATCH STABLE V2] i18n: fix case folding problem with problematic encodings

Fri Dec 2 10:39:20 CST 2011

At Thu, 01 Dec 2011 10:55:11 -0600,
Matt Mackall wrote:

> > As I understand it:
> > 
> >     "util.normcase()" should abstract case folding policy, so
> >     normcase-ed result should not be expected to be either lower or
> >     upper.
> > 
> > 
> > Then, I categorize lower/upper-ing points in current implementation.
> > 
> >   A. compare between filenames (directly or in-directly)
> > 
> >      "util.normcase()" should be applied on them.
> 
> util.normcase should only be applied after we've determined that we're
> on a case-insensitive filesystem. We've done a pretty good job of
> restricting its usage to dirstate.py, which carefully caches all the
> relevant bits with _foldmap and normalize.
> 
> You should spend a while understanding dirstate.normalize. It's not
> enough to be case-insensitive, we also have to be case-preserving.

As I understand "dirstate._foldmap" functionality:

  - dirstate stores up "case-preserved" names
    (* "logical name")

  - "util.normcase()" should emulate case folding policy in target
    case-insensitive filesystem

    case folding policy depends on filesystem implementation, so we
    should not expect either lower-ed or upper-ed.

  - "dirstate._foldmap" maps from "util.normcase()"-ed (= case folded)
    names to "case-preserved" ones

      - if it is already tracked, this mapping gives original
        case-preserved name (= "logical name")

      - otherwise, "case-preserved" name is given from filesystem
        layer: case information is preserved in many case-insensitive
        filesystem

        then, given name may be stored into dirstate, and become
        "logical name"

Almost all filename compare should be applied on "logical name" (and
any of lower/upper can be available), so "util.normcase()" can be
limited in "dirstate.normalize" family, as you described in above
reply.

Do I understand correctly ?

Then, I picked up some filename comparing points where I have less
confidence in my inference whether "util.normcase()" is needed or not.

# especially (3) and (4) !!

  1. encoding.lower() in merge._checkcollision():
  2. encoding.lower() in scmutil.casecollisionauditor:

     this checks collision between "logical name"s, so
     "util.normcase()" is not needed

  3. os.path.normcase() in scmutil.pathauditor.__call__():
  4. os.path.normcase() in util.fspath():

     these "os.path.normcase()"-ed strings are used to invoke
     "os.lstat()" or "os.listdir()".

     so, just lower/upper-ed name may cause on some case-insensitive
     filesystems, but os.path.normcase() is not suitable, too.

     "util.fspath()" is used only on case-insensitive filesystem, so
     "util.normcase()" may be reasonable. but "scmutil.pathauditor" is
     used also on case-sensitive filesystem.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp