[PATCH] highlight: fixes garbled text in non-UTF-8 environment

Martin Geisler mg at lazybytes.net
Sat Sep 5 06:25:37 CDT 2009


Christian Ebert <blacktrash at gmx.net> writes:

> * Yuya Nishihara on Wednesday, September 02, 2009 at 11:16:03 -0000
>
>> Current implementation, db7557359636 (issue1341):
>>
>> 1. Convert original `text`, which is treated as UTF-8, to locale's
>>    encoding. `encoding.tolocal()` is the method to convert from
>>    internal UTF-8 to local. If original `text` is not UTF-8, e.g.
>>    Japanese EUC-JP, some characters become garbled here.
>
> So why did iso-8859-1 content not become garbled? Probably because it
> was in fallbackencoding.

Yes, but I think it was more luck than than actual design to me :-)

> Have you checked whether this still highlights the text in question?
> With this patch I lose all highlighting!

That is strage -- it works as advertised here: I tested with some
Japanese characters from our translation and they were rendered
correctly when the encoding matched the file encoding, and garbled
otherwise. I've only tested with 'hg serve --encoding xxx', not
hgweb.cgi, if that makes a difference.

> I don't know why exactly. Have to investigate. There are so many
> places where encoding can be set:
>
> - hgrc files
> - environment
> - [web].encoding
> - hgwebdir.cgi
>
> etc. Except by experimenting I don't even know which gets
> precedence. E.g. I just discovered that setting [web].encoding to
> something like iso-8859-1 causes a traceback (not because of your
> patch) whereas ascii doesn't (just garbling).

Okay, that's bad...

> The test should probably contain not a .txt file (won't be
> highlighted anyway) but a file that is recognized by extension
> (and may contain non-ascii characters).

I'm pretty sure a .txt file is "highlighted" by the TextLexer from
Pygments.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20090905/aec5453a/attachment.pgp 


More information about the Mercurial-devel mailing list