RFC: safe pattern matching for problematic encoding

Adrian Buehlmann adrian at cadifra.com
Thu May 31 10:44:28 CDT 2012


On 2012-05-23 14:53, Mads wrote:
> On 23/05/12 14:38, FUJIWARA Katsunori wrote:
>> Hi, devels.
>>
>> I'm working to achieve safe pattern matching/parsing for problematic
>> encodings (e.g.: cp932), in which strings may contain '\\' as a part
>> of multi-byte characters.
> 
> Is that a part of improving hgext/win32mbcs.py ? Or how are they related?
> 
>> As you noticed, wrapping/hooking points are scattered in widely, so I
>> think that this implementation is not so good. But I don't have any
>> other ideas.
>>
>> Are there any other ideas to solve this problem ?
> 
> The only viable solution is to consistently use utf-8 inside Mercurial.
> 
>> BTW, how is "using Unicode API on Windows" plan progressing ?
>>
>>    http://www.selenic.com/pipermail/mercurial-devel/2011-December/036385.html
> 
> No progress at all. Windows users do apparently not care enough about 
> the problem to contribute in any way.

The chances that anyone will succeed at getting anything accomplished
there are pretty small. Although it's surely a nice black hole for
wasting your time.

Compare this to how small the need of most developers to use Non-ASCII
filenames is.

Not really surprising to me that Windows developers simply prefer to
stay out of this mess.


More information about the Mercurial-devel mailing list