RFC: safe pattern matching for problematic encoding

Matt Mackall mpm at selenic.com
Wed May 23 14:03:26 CDT 2012


On Wed, 2012-05-23 at 22:29 +0900, FUJIWARA Katsunori wrote:
> At Wed, 23 May 2012 14:53:43 +0200,
> Mads wrote:
> > 
> > On 23/05/12 14:38, FUJIWARA Katsunori wrote:
> > > Hi, devels.
> > >
> > > I'm working to achieve safe pattern matching/parsing for problematic
> > > encodings (e.g.: cp932), in which strings may contain '\\' as a part
> > > of multi-byte characters.
> > 
> > Is that a part of improving hgext/win32mbcs.py ? Or how are they related?
> 
> In my patch serires:
> 
>   - add some wrapping/hooking points to core code, and
>   - wrap/hook them by hgext/win32mbcs, if it is enabled
> 
> > > As you noticed, wrapping/hooking points are scattered in widely, so I
> > > think that this implementation is not so good. But I don't have any
> > > other ideas.
> > >
> > > Are there any other ideas to solve this problem ?
> > 
> > The only viable solution is to consistently use utf-8 inside Mercurial.
> 
> I also think so, too :-)
> 
> > > BTW, how is "using Unicode API on Windows" plan progressing ?
> > >
> > >    http://www.selenic.com/pipermail/mercurial-devel/2011-December/036385.html
> > 
> > No progress at all. Windows users do apparently not care enough about 
> > the problem to contribute in any way.
> 
> I have thought that the mail from Matt was the announcement to start
> of work in core development team, but it is my mis-understanding,
> isn't it ?

That was this:

http://mercurial.markmail.org/thread/gdutukafpo4euc7i

As you can see.. no one expressed ANY interest at all despite
complaining about this issue more or less constantly.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list