highlight: fixes garbled text in non-UTF-8 environment

Yuya Nishihara youjah at gmail.com
Sat Sep 5 06:34:43 CDT 2009


Hello,

Christian Ebert wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya at tcha.org>
> > # Date 1251527055 -32400
> > # Node ID 54e7217e12558be85f8ae410f1a168b58b966bae
> > # Parent  37042e8b3b342b2e380d8be3e3f7692584c92d33
> > highlight: fixes garbled text in non-UTF-8 environment
...
> > 
> > Current implementation, db7557359636 (issue1341):
> > 1. Convert original `text`, which is treated as UTF-8, to locale's encoding.
> >   `encoding.tolocal()` is the method to convert from internal UTF-8 to local.
> >   If original `text` is not UTF-8, e.g. Japanese EUC-JP, some characters
> >   become garbled here.
> 
> So why did iso-8859-1 content not become garbled? Probably
> because it was in fallbackencoding.

Yes. If it contains non-ascii character, e.g., umlaut a, it'll be decoded
as fallbackencoding, 'iso-8859-1'.

> 
> > 2. pygmentize, with no UnicodeDecodeError.
> > 
> > This patch:
> > 1. Convert original `text`, which is treated as locale's encoding, to unicode.
> >   Pygments prefers unicode object than raw str. [1]_
> >   If original `text` is not encoded by locale's encoding, some characters
> >   become garbled here.
> > 2. pygmentize, also with no UnicodeDecodeError :)
> > 3. Convert unicode back to raw str, which is encoded by locale's.
> 
> Have you checked whether this still highlights the text in
> question? With this patch I lose all highlighting!

I want to reproduce the problem, but currently I cannot.

I've tested *.py files of 4 encodings:
  iso-8859-1 (contains umlaut a), cp932 (japanese), euc-jp (japanese), utf-8
against 5 HGENCODINGs:
  ascii, iso-8859-1, cp932, euc-jp, utf-8
by using HGENCODING=xxx path/to/crew-stable/hg serve 
  or HGENCODING=xxx path/to/crew/hg serve
but everything okay on my machine. No highlighting lost.

My environment is:
--
$ python -V
Python 2.5.4
$ echo $LANG
ja_JP.UTF-8
$ iconv -V
iconv (EGLIBC) 2.9
$ python -c 'import pygments; print pygments.__version__;'
1.0
--

Yuya.


More information about the Mercurial-devel mailing list