[PATCH STABLE V2] i18n: fix case folding problem with problematic encodings

Tue Nov 29 15:38:30 CST 2011

On Wed, 2011-11-30 at 05:24 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1322598040 -32400
> # Branch stable
> # Node ID 5bf954f0303aefbcbfc2eefbefc5d7e9f95b98a7
> # Parent  e387e760b207383c961ed8accd35583791a33bb0
> i18n: fix case folding problem with problematic encodings
> 
> changeset 28e98a8b173d for case folding problem with problematic
> encoding was not enough.
> 
> this patch covers up a fault of fix in it.

Eep, way too much in one patch. Each of these bullet points ought to be
its own patch.

>   - switch internal format from str to unicode for "util.fspath()"

Broken broken broken on Linux. You can have _any bytes except null and /
in a valid Unix filename_, which means they can't be assumed to be
decodable in any encoding, let alone the current user's personal
encoding. Sensible users will use UTF-8 and UTF-8 only and only exchange
files with other people using UTF-8, but there's no guarantee that users
are sensible.

(NTFS has a related issue: filenames can be arbitrary 16-bit strings,
and needn't map into the valid UTF-16 codepoint space.)

>   - switch from "str.lower()" to "encoding.lower()"

Again, lower() is known to be wrong for NTFS. We need to use upper().

https://blogs.msdn.com/b/michkap/archive/2005/01/16/353873.aspx

-- 
Mathematics is the supreme nostalgia of our time.