[PATCH] highlight: pass hgweb.encoding to lexers and formatter
Christian Ebert
blacktrash at gmx.net
Tue Dec 11 17:29:15 CST 2007
* Matt Mackall on Tuesday, December 11, 2007 at 17:07:26 -0600
> On Tue, Dec 11, 2007 at 11:40:25PM +0100, Christian Ebert wrote:
>> * Matt Mackall on Tuesday, December 11, 2007 at 15:50:00 -0600
>>> On Tue, Dec 11, 2007 at 10:21:23PM +0100, Christian Ebert wrote:
>>>> The following is needed to avoid a nasty backtrace when a file
>>>> contains non-ascii characters.
>>>>
>>>> Should perhaps be tested in non-utf locale; also I am not
>>>> entirely sure if the lexers should get passed util._encoding.
>>>> Anyway this gave consistent results re encoding with highlight
>>>> turned on and off.
>>>
>>> Ugh. Apps should assume that regardless of what encoding they're in,
>>> someone's going to throw them a byte that can't be decoded. If it was
>>> throwing an exception when it was assuming ASCII, it will still throw
>>> exceptions when you try to pass off Latin-1 as UTF-8 or whatever. So
>>> this fix is insufficient.
>>
>> No doubt. I am rather confused by the pygments docs (input charset
>> iso-8859-1 is assumed???) too, see below.
>>
>>> Odds are good that pygments is hopelessly infected with Unicode
>>> braindamage, so I somehow doubt there -is- a good fix.
>>
>> Frankly, I just tried "to make it work" for my machine. But
>> perhaps someone more savvy with pygments has an idea; or can make
>> something coherent out of the docs. I quote the relevant section
>> for reference:
> ...
>> Since Pygments 0.6, all lexers use unicode strings internally. Because of that
>> you might encounter the occasional `UnicodeDecodeError` if you pass strings with the
>> wrong encoding.
>
> Yeah, that's the brain damage I was talking about.
>
>> The best way is to pass Pygments unicode objects. In that case you can't get
>> unexpected output.
>
> And that's a bit of a strong statement. Anyway, this is probably the
> best route - simply decode the strings yourself (using util.tolocal)
That's what I tried first, because it seemed logical -- still
backtrace. But I'll try again, harder ;-)
c
--
keyword extension for Mercurial (http://selenic.com/mercurial):
<http://www.blacktrash.org/hg/hgkeyword/> (incl. 0.9.2 compatible branch)
Mercurial crew development repository + keyword extension:
<http://www.blacktrash.org/hg/hg-crew-keyword/>
More information about the Mercurial-devel
mailing list