RFC: safe pattern matching for problematic encoding

Mads mads at kiilerich.com
Wed May 23 08:40:39 CDT 2012


On 23/05/12 15:14, Antoine Pitrou wrote:
> On Wed, 23 May 2012 14:53:43 +0200
> Mads<mads at kiilerich.com>  wrote:
>>> As you noticed, wrapping/hooking points are scattered in widely, so I
>>> think that this implementation is not so good. But I don't have any
>>> other ideas.
>>>
>>> Are there any other ideas to solve this problem ?
>> The only viable solution is to consistently use utf-8 inside Mercurial.
> Or to consistently use unicode strings ;)

Oh please - there is no need for trolling here.

Like it or not, Mercurial will never use Python unicode strings 
internally. It is a deliberate and well-informed choice - but the rest 
of the world is of course free to disagree.

That do of course not mean that Mercurial can't or won't have full 
support for unicode. It just means that we prefer to do the encoding 
once at userinterface / filename boundary instead of using several 
intermediate and potentially lossy re-codings.

/Mads



More information about the Mercurial-devel mailing list