RFC: safe pattern matching for problematic encoding

Mads mads at kiilerich.com
Thu May 24 08:10:33 CDT 2012


On 24/05/12 14:16, FUJIWARA Katsunori wrote:
> With current implementation:
>
>      1. add the file, of which name is changed by NFD, on MacOS

So the file has the 'wrong' normalization from the beginning, even 
before Mercurial see it for the first time?

How do you imagine Mercurial possibly could 'fix' that? Do you have 
general rules for an alternative normalization that can clean up the 
MacOS normalization?

(Files that have non-NFD encoded names in the repo and on real unix will 
obviously have the NFD names on MacOS file systems, but Mercurial will 
preserve the original filename encoding when committing.)

> # both via GUI file browsers and CUI terminals

(That non-standard markup do probably not help communicating your intent.)

> For example, "i18n/ja.po" message translation file contains 122181 non
> ascii Japanese characters and 10% of them can be changed by NFD: this
> ratio may become 15% or more in some situation.

How is file content and encoding of translations related to MacOS 
filename normalization?

As you probably know: Mercurial never touches or changes (or care about) 
file content encoding - it just reproduce it 100% reliably.

On 24/05/12 14:38, Noel Grandin wrote:
> Perhaps the first step towards solving these problems is to build 3 
> mock-filesystem libraries
> - a MacOS-type-NFD system
> - a Windows-type case-insensitive system
> - a straightforward Linux-type system
>
> And then create some unit tests using these mock-filesystem libraries 
> which expose the problems.
>
> Then everybody can test and fix without needing 3 different boxes :-)
>
> Links:
> https://launchpad.net/mockfs

We already have http://selenic.com/repo/hg/file/tip/contrib/casesmash.py .

Further work in that direction could perhaps be useful in some cases, 
but it is in no way a prerequisite or sufficient for implementing 
support for all the real quirks of the real platforms.

/Mads



More information about the Mercurial-devel mailing list