[PATCH] util: wrap lines with multi-byte characters correctly (issue2943)
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Fri Aug 26 14:45:31 CDT 2011
Hi, Mads.
At Sat, 06 Aug 2011 23:53:10 +0200,
Mads Kiilerich wrote:
>
> # HG changeset patch
> # User Mads Kiilerich <mads at kiilerich.com>
> # Date 1312667540 -7200
> # Branch stable
> # Node ID 522ef2a25786c3666d4381d38944fe6d3aa64e5d
> # Parent f32a2989ff585f0f452f25806750477fc631fc9a
> util: wrap lines with multi-byte characters correctly (issue2943)
>
> This re-introduces the unicode conversion what was lost in d320e70442a5 5 years
> ago and had the comment:
> To avoid corrupting multi-byte characters in line, we must wrap
> a Unicode string instead of a bytestring.
Unfortunately, your patch does not work correctly for Japanese
characters encoded in both CP932 and UTF-8, because 'len()' on
UNICODDE string recognizes all characters as narrow (= 1 column) even
though they are non-ambiguous wide ones.
But it is fact that implementation based on my patch (d320e70442a5)
does not work correctly for Russian characters encoded in UTF-8.
I'll quickly post additional pacth to fix the problem for Japanese
characters (and other east asian characters, may be) without
regression for Russian characters, so please check and comment on it !
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list