RFC: safe pattern matching for problematic encoding

Wed May 23 10:21:51 CDT 2012

On Wed, 23 May 2012 15:40:39 +0200
Mads <mads at kiilerich.com> wrote:
> 
> Oh please - there is no need for trolling here.
> 
> Like it or not, Mercurial will never use Python unicode strings 
> internally. It is a deliberate and well-informed choice - but the rest 
> of the world is of course free to disagree.
> 
> That do of course not mean that Mercurial can't or won't have full 
> support for unicode. It just means that we prefer to do the encoding 
> once at userinterface / filename boundary instead of using several 
> intermediate and potentially lossy re-codings.

There is inherently no "intermediate and potentially lossy re-coding"
in the use of unicode objects. Not more than in recoding to utf-8,
anyway.

That said, I understand that you've made a design decision and would
like to stick it. I certainly don't think it's inherently better,
though (it's probably worse in several respects, actually, as I could
experiment when fixing our email hook to properly reflect non-ASCII
author names and commit messages).

Regards

Antoine.