[PATCH STABLE] i18n: use utf-8 encoding to show about converted revisions (issue3393)

Fri Apr 27 12:04:11 CDT 2012

On Tue, 2012-04-24 at 21:55 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1335271935 -32400
> # Branch stable
> # Node ID 9bbc6c974eea21102e08ccfb9324ae79504ee69a
> # Parent  09dd707b522a766b7d5e5fd221c4e68ac735f4d9
> i18n: use utf-8 encoding to show about converted revisions (issue3393)

This patch looks incorrect to me..

> status information of "hg convert" contains byte sequences in two
> different encodings, when:
> 
>   - non utf-8 encoding is chosen as one for Mercurial,
>   - the language using non-ascii characters in localized messages is
>     chosen by locale setting, and
>   - any converted revisions have description using non-ascii characters
> 
> this occurs because messages shown by "hg convert" are encoded in
> utf-8 forcibly, but descriptions of converted revisions are encoded in
> "orig_encoding" via "recode()" method.

..but I can't be sure because neither the bug report nor the patch
contains an example of the problem. I don't want to work backwards from
your analysis to create my own (possibly incorrect) example and then
forward again.

I think rather than do this sort of thing, we should actually try to
solve the encoding problem that's pervasive in convert. The answer here
is probably to use localstrs:

http://mercurial.selenic.com/wiki/EncodingStrategy#Round-trip_conversion
http://www.selenic.com/hg/file/4bce649a2b0f/mercurial/encoding.py#l51

Thus, rather than making a (bogus) change of setting encoding.encoding
to UTF-8, we should take the Unicode/UTF-8/whatever metadata input from
the SCM sources, wrap it in a localstr that's compatible with the local
encoding, then unwrap it in the sink, losslessly.

-- 
Mathematics is the supreme nostalgia of our time.