RFC: safe pattern matching for problematic encoding

Mads mads at kiilerich.com
Wed May 23 07:53:43 CDT 2012


On 23/05/12 14:38, FUJIWARA Katsunori wrote:
> Hi, devels.
>
> I'm working to achieve safe pattern matching/parsing for problematic
> encodings (e.g.: cp932), in which strings may contain '\\' as a part
> of multi-byte characters.

Is that a part of improving hgext/win32mbcs.py ? Or how are they related?

> As you noticed, wrapping/hooking points are scattered in widely, so I
> think that this implementation is not so good. But I don't have any
> other ideas.
>
> Are there any other ideas to solve this problem ?

The only viable solution is to consistently use utf-8 inside Mercurial.

> BTW, how is "using Unicode API on Windows" plan progressing ?
>
>    http://www.selenic.com/pipermail/mercurial-devel/2011-December/036385.html

No progress at all. Windows users do apparently not care enough about 
the problem to contribute in any way.

/Mads


More information about the Mercurial-devel mailing list