[PATCH] util: wrap lines with multi-byte characters correctly (issue2943)

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Fri Aug 26 14:45:31 CDT 2011


Hi, Mads.

At Sat, 06 Aug 2011 23:53:10 +0200,
Mads Kiilerich wrote:
> 
> # HG changeset patch
> # User Mads Kiilerich <mads at kiilerich.com>
> # Date 1312667540 -7200
> # Branch stable
> # Node ID 522ef2a25786c3666d4381d38944fe6d3aa64e5d
> # Parent  f32a2989ff585f0f452f25806750477fc631fc9a
> util: wrap lines with multi-byte characters correctly (issue2943)
> 
> This re-introduces the unicode conversion what was lost in d320e70442a5 5 years
> ago and had the comment:
>   To avoid corrupting multi-byte characters in line, we must wrap
>   a Unicode string instead of a bytestring.

Unfortunately, your patch does not work correctly for Japanese
characters encoded in both CP932 and UTF-8, because 'len()' on
UNICODDE string recognizes all characters as narrow (= 1 column) even
though they are non-ambiguous wide ones.

But it is fact that implementation based on my patch (d320e70442a5)
does not work correctly for Russian characters encoded in UTF-8.

I'll quickly post additional pacth to fix the problem for Japanese
characters (and other east asian characters, may be) without
regression for Russian characters, so please check and comment on it !

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list