[PATCH STABLE] i18n: use utf-8 encoding to show about converted revisions (issue3393)

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Tue Apr 24 07:55:00 CDT 2012


# HG changeset patch
# User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
# Date 1335271935 -32400
# Branch stable
# Node ID 9bbc6c974eea21102e08ccfb9324ae79504ee69a
# Parent  09dd707b522a766b7d5e5fd221c4e68ac735f4d9
i18n: use utf-8 encoding to show about converted revisions (issue3393)

status information of "hg convert" contains byte sequences in two
different encodings, when:

  - non utf-8 encoding is chosen as one for Mercurial,
  - the language using non-ascii characters in localized messages is
    chosen by locale setting, and
  - any converted revisions have description using non-ascii characters

this occurs because messages shown by "hg convert" are encoded in
utf-8 forcibly, but descriptions of converted revisions are encoded in
"orig_encoding" via "recode()" method.

this patch avoids re-encoding by "orig_encoding" to unify encoding of
output except for "ascii" encoding: in such case, this patch uses
"ascii" encoding for backward compatibility.


original implementation of "recode()" was introduced by changeset
4c16020d1172 (convert: print commit log message with local encoding
correctly): at that time, many of messages shown by "hg convert" were
not yet internationalized, so encoding collision occurred rarely.

examination of unicode-ness in "recode()" was introduced by changeset
e2cbdd931341.

diff -r 09dd707b522a -r 9bbc6c974eea hgext/convert/convcmd.py
--- a/hgext/convert/convcmd.py	Wed Apr 18 11:46:23 2012 -0500
+++ b/hgext/convert/convcmd.py	Tue Apr 24 21:52:15 2012 +0900
@@ -25,9 +25,12 @@
 
 def recode(s):
     if isinstance(s, unicode):
-        return s.encode(orig_encoding, 'replace')
+        return s.encode('utf-8', 'replace')
+    elif orig_encoding == 'ascii':
+        # avoid to show non-ascii characters
+        return s.decode('utf-8').encode(orig_encoding, 'replace')
     else:
-        return s.decode('utf-8').encode(orig_encoding, 'replace')
+        return s
 
 source_converters = [
     ('cvs', convert_cvs, 'branchsort'),
@@ -375,9 +378,7 @@
                 desc = self.commitcache[c].desc
                 if "\n" in desc:
                     desc = desc.splitlines()[0]
-                # convert log message to local encoding without using
-                # tolocal() because the encoding.encoding convert()
-                # uses is 'utf-8'
+                # use 'recode()' to ensure writing out byte sequence in UTF-8
                 self.ui.status("%d %s\n" % (num, recode(desc)))
                 self.ui.note(_("source: %s\n") % recode(c))
                 self.ui.progress(_('converting'), i, unit=_('revisions'),


More information about the Mercurial-devel mailing list