RFC: safe pattern matching for problematic encoding

Wed May 23 08:29:27 CDT 2012

At Wed, 23 May 2012 14:53:43 +0200,
Mads wrote:
> 
> On 23/05/12 14:38, FUJIWARA Katsunori wrote:
> > Hi, devels.
> >
> > I'm working to achieve safe pattern matching/parsing for problematic
> > encodings (e.g.: cp932), in which strings may contain '\\' as a part
> > of multi-byte characters.
> 
> Is that a part of improving hgext/win32mbcs.py ? Or how are they related?

In my patch serires:

  - add some wrapping/hooking points to core code, and
  - wrap/hook them by hgext/win32mbcs, if it is enabled

> > As you noticed, wrapping/hooking points are scattered in widely, so I
> > think that this implementation is not so good. But I don't have any
> > other ideas.
> >
> > Are there any other ideas to solve this problem ?
> 
> The only viable solution is to consistently use utf-8 inside Mercurial.

I also think so, too :-)

> > BTW, how is "using Unicode API on Windows" plan progressing ?
> >
> >    http://www.selenic.com/pipermail/mercurial-devel/2011-December/036385.html
> 
> No progress at all. Windows users do apparently not care enough about 
> the problem to contribute in any way.

I have thought that the mail from Matt was the announcement to start
of work in core development team, but it is my mis-understanding,
isn't it ?

I and someone in Japan have much interest in contribution to solve
this problem, because it is very important to spread Mercurial in
Japan !

Should we start to implement according to the policy described by Matt
immediately ?

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp