[PATCH] highlight: pass hgweb.encoding to lexers and formatter

Christian Ebert blacktrash at gmx.net
Tue Dec 11 17:29:15 CST 2007


* Matt Mackall on Tuesday, December 11, 2007 at 17:07:26 -0600
> On Tue, Dec 11, 2007 at 11:40:25PM +0100, Christian Ebert wrote:
>> * Matt Mackall on Tuesday, December 11, 2007 at 15:50:00 -0600
>>> On Tue, Dec 11, 2007 at 10:21:23PM +0100, Christian Ebert wrote:
>>>> The following is needed to avoid a nasty backtrace when a file
>>>> contains non-ascii characters.
>>>> 
>>>> Should perhaps be tested in non-utf locale; also I am not
>>>> entirely sure if the lexers should get passed util._encoding.
>>>> Anyway this gave consistent results re encoding with highlight
>>>> turned on and off.
>>> 
>>> Ugh. Apps should assume that regardless of what encoding they're in,
>>> someone's going to throw them a byte that can't be decoded. If it was
>>> throwing an exception when it was assuming ASCII, it will still throw
>>> exceptions when you try to pass off Latin-1 as UTF-8 or whatever. So
>>> this fix is insufficient.
>> 
>> No doubt. I am rather confused by the pygments docs (input charset
>> iso-8859-1 is assumed???) too, see below.
>> 
>>> Odds are good that pygments is hopelessly infected with Unicode
>>> braindamage, so I somehow doubt there -is- a good fix.
>> 
>> Frankly, I just tried "to make it work" for my machine. But
>> perhaps someone more savvy with pygments has an idea; or can make
>> something coherent out of the docs. I quote the relevant section
>> for reference:
> ... 
>> Since Pygments 0.6, all lexers use unicode strings internally. Because of that
>> you might encounter the occasional `UnicodeDecodeError` if you pass strings with the
>> wrong encoding.
> 
> Yeah, that's the brain damage I was talking about. 
> 
>> The best way is to pass Pygments unicode objects. In that case you can't get
>> unexpected output.
> 
> And that's a bit of a strong statement. Anyway, this is probably the
> best route - simply decode the strings yourself (using util.tolocal)

That's what I tried first, because it seemed logical -- still
backtrace. But I'll try again, harder ;-)

c
-- 
keyword extension for Mercurial (http://selenic.com/mercurial):
<http://www.blacktrash.org/hg/hgkeyword/> (incl. 0.9.2 compatible branch)
Mercurial crew development repository + keyword extension:
<http://www.blacktrash.org/hg/hg-crew-keyword/>


More information about the Mercurial-devel mailing list