RFC: safe pattern matching for problematic encoding

Martin Geisler martin at geisler.net
Wed May 23 15:36:45 CDT 2012


Matt Mackall <mpm at selenic.com> writes:

> On Wed, 2012-05-23 at 22:29 +0900, FUJIWARA Katsunori wrote:
>> At Wed, 23 May 2012 14:53:43 +0200,
>> Mads wrote:
>> > 
>> > On 23/05/12 14:38, FUJIWARA Katsunori wrote:
>> > > Hi, devels.
>> > >
>> > > I'm working to achieve safe pattern matching/parsing for problematic
>> > > encodings (e.g.: cp932), in which strings may contain '\\' as a part
>> > > of multi-byte characters.
>> > 
>> > Is that a part of improving hgext/win32mbcs.py ? Or how are they related?
>> 
>> In my patch serires:
>> 
>>   - add some wrapping/hooking points to core code, and
>>   - wrap/hook them by hgext/win32mbcs, if it is enabled
>> 
>> > > As you noticed, wrapping/hooking points are scattered in widely, so I
>> > > think that this implementation is not so good. But I don't have any
>> > > other ideas.
>> > >
>> > > Are there any other ideas to solve this problem ?
>> > 
>> > The only viable solution is to consistently use utf-8 inside Mercurial.
>> 
>> I also think so, too :-)
>> 
>> > > BTW, how is "using Unicode API on Windows" plan progressing ?
>> > >
>> > >    http://www.selenic.com/pipermail/mercurial-devel/2011-December/036385.html
>> > 
>> > No progress at all. Windows users do apparently not care enough
>> > about the problem to contribute in any way.
>> 
>> I have thought that the mail from Matt was the announcement to start
>> of work in core development team, but it is my mis-understanding,
>> isn't it ?
>
> That was this:
>
> http://mercurial.markmail.org/thread/gdutukafpo4euc7i
>
> As you can see.. no one expressed ANY interest at all despite
> complaining about this issue more or less constantly.

I think I mentioned on IRC that I really like the idea. Using the
Unicode API is something we should have done from the beginning, so it
will be great if we begin using it on Windows (where users feel the pain
the most when they use non-ascii characters in their file names).


-- 
Martin Geisler

aragost Trifork
Commercial Mercurial support
http://aragost.com/mercurial/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120523/0e3ad5c4/attachment.pgp>


More information about the Mercurial-devel mailing list