[PATCH STABLE V2] i18n: fix case folding problem with problematic encodings

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Wed Nov 30 00:24:52 CST 2011


At Tue, 29 Nov 2011 15:38:30 -0600,
Matt Mackall wrote:
> 
> On Wed, 2011-11-30 at 05:24 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1322598040 -32400
> > # Branch stable
> > # Node ID 5bf954f0303aefbcbfc2eefbefc5d7e9f95b98a7
> > # Parent  e387e760b207383c961ed8accd35583791a33bb0
> > i18n: fix case folding problem with problematic encodings
> > 
> > changeset 28e98a8b173d for case folding problem with problematic
> > encoding was not enough.
> > 
> > this patch covers up a fault of fix in it.
> 
> Eep, way too much in one patch. Each of these bullet points ought to be
> its own patch.
> 
> >   - switch internal format from str to unicode for "util.fspath()"
> 
> Broken broken broken on Linux. You can have _any bytes except null and /
> in a valid Unix filename_, which means they can't be assumed to be
> decodable in any encoding, let alone the current user's personal
> encoding. Sensible users will use UTF-8 and UTF-8 only and only exchange
> files with other people using UTF-8, but there's no guarantee that users
> are sensible.
> 
> (NTFS has a related issue: filenames can be arbitrary 16-bit strings,
> and needn't map into the valid UTF-16 codepoint space.)

Thank you for your comment. I'll re-write with your suggestions.

> >   - switch from "str.lower()" to "encoding.lower()"
> 
> Again, lower() is known to be wrong for NTFS. We need to use upper().
> 
> https://blogs.msdn.com/b/michkap/archive/2005/01/16/353873.aspx

Please confirm my understanding.

"use upper()" seems to consist of below actions.

  1. use "upper()" (or NEW "encoding.upper()") for "posix.normcase()"

  2. switch from "lower()" (or "encoding.lower()") for filename case
     folding to "util.normcase()"

     # this is for readabilty/maintenancability

  3. upper case of fixed strings which are compared against normcase-d
     string (or introduce case-folding-compare function ?)

But "os.path.normcase()" of Windows native Python lowers specified
strings, so compare with upper-ed string seems to cause unexpected
failure.

# using Python 2.7.2 on Windows 7 Japanese environment

Should I use "upper()" as normcase explicitly also in windows
environment ? or do I mis-understand your suggestion ?


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list