highlight: fixes garbled text in non-UTF-8 environment
Yuya Nishihara
youjah at gmail.com
Sat Sep 5 06:34:43 CDT 2009
Hello,
Christian Ebert wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya at tcha.org>
> > # Date 1251527055 -32400
> > # Node ID 54e7217e12558be85f8ae410f1a168b58b966bae
> > # Parent 37042e8b3b342b2e380d8be3e3f7692584c92d33
> > highlight: fixes garbled text in non-UTF-8 environment
...
> >
> > Current implementation, db7557359636 (issue1341):
> > 1. Convert original `text`, which is treated as UTF-8, to locale's encoding.
> > `encoding.tolocal()` is the method to convert from internal UTF-8 to local.
> > If original `text` is not UTF-8, e.g. Japanese EUC-JP, some characters
> > become garbled here.
>
> So why did iso-8859-1 content not become garbled? Probably
> because it was in fallbackencoding.
Yes. If it contains non-ascii character, e.g., umlaut a, it'll be decoded
as fallbackencoding, 'iso-8859-1'.
>
> > 2. pygmentize, with no UnicodeDecodeError.
> >
> > This patch:
> > 1. Convert original `text`, which is treated as locale's encoding, to unicode.
> > Pygments prefers unicode object than raw str. [1]_
> > If original `text` is not encoded by locale's encoding, some characters
> > become garbled here.
> > 2. pygmentize, also with no UnicodeDecodeError :)
> > 3. Convert unicode back to raw str, which is encoded by locale's.
>
> Have you checked whether this still highlights the text in
> question? With this patch I lose all highlighting!
I want to reproduce the problem, but currently I cannot.
I've tested *.py files of 4 encodings:
iso-8859-1 (contains umlaut a), cp932 (japanese), euc-jp (japanese), utf-8
against 5 HGENCODINGs:
ascii, iso-8859-1, cp932, euc-jp, utf-8
by using HGENCODING=xxx path/to/crew-stable/hg serve
or HGENCODING=xxx path/to/crew/hg serve
but everything okay on my machine. No highlighting lost.
My environment is:
--
$ python -V
Python 2.5.4
$ echo $LANG
ja_JP.UTF-8
$ iconv -V
iconv (EGLIBC) 2.9
$ python -c 'import pygments; print pygments.__version__;'
1.0
--
Yuya.
More information about the Mercurial-devel
mailing list