[PATCH] util: wrap lines with multi-byte characters correctly (issue2943)
Mads Kiilerich
mads at kiilerich.com
Sat Aug 6 17:26:34 CDT 2011
Mads Kiilerich wrote, On 08/06/2011 11:53 PM:
> # HG changeset patch
> # User Mads Kiilerich<mads at kiilerich.com>
> # Date 1312667540 -7200
> # Branch stable
> # Node ID 522ef2a25786c3666d4381d38944fe6d3aa64e5d
> # Parent f32a2989ff585f0f452f25806750477fc631fc9a
> util: wrap lines with multi-byte characters correctly (issue2943)
I don't know if this qualifies for stable?
> This re-introduces the unicode conversion what was lost in d320e70442a5 5 years
Regression fixes should go to stable? ;-)
> ago and had the comment:
> To avoid corrupting multi-byte characters in line, we must wrap
> a Unicode string instead of a bytestring.
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -1148,16 +1148,14 @@
> def __init__(self, **kwargs):
> textwrap.TextWrapper.__init__(self, **kwargs)
>
> - def _cutdown(self, str, space_left):
> + def _cutdown(self, ucstr, space_left):
> l = 0
> - ucstr = unicode(str, encoding.encoding)
> colwidth = unicodedata.east_asian_width
> for i in xrange(len(ucstr)):
> l += colwidth(ucstr[i]) in 'WFA' and 2 or 1
> if space_left< l:
> - return (ucstr[:i].encode(encoding.encoding),
> - ucstr[i:].encode(encoding.encoding))
> - return str, ''
> + return (ucstr[:i], ucstr[i:])
> + return ucstr, ''
>
> # overriding of base class
> def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
> @@ -1179,10 +1177,13 @@
> if width<= maxindent:
> # adjust for weird terminal size
> width = max(78, maxindent + 1)
> + line = line.decode(encoding.encoding, encoding.encodingmode)
> + initindent = initindent.decode(encoding.encoding, encoding.encodingmode)
> + hangindent = hangindent.decode(encoding.encoding, encoding.encodingmode)
> wrapper = MBTextWrapper(width=width,
> initial_indent=initindent,
> subsequent_indent=hangindent)
> - return wrapper.fill(line)
> + return wrapper.fill(line).encode(encoding.encoding)
>
> def iterlines(iterator):
> for chunk in iterator:
> diff --git a/tests/test-encoding-align.t b/tests/test-encoding-align.t
> --- a/tests/test-encoding-align.t
> +++ b/tests/test-encoding-align.t
> @@ -22,14 +22,14 @@
> > cmdtable = {
> > 'showoptlist':
> > (showoptlist,
> -> [('s', 'opt1', '', 'short width', '""" + s + """'),
> -> ('m', 'opt2', '', 'middle width', '""" + m + """'),
> -> ('l', 'opt3', '', 'long width', '""" + l + """')
> +> [('s', 'opt1', '', 'short width' + ' %(s)s' * 8, '%(s)s'),
> +> ('m', 'opt2', '', 'middle width' + ' %(m)s' * 8, '%(m)s'),
> +> ('l', 'opt3', '', 'long width' + ' %(l)s' * 8, '%(l)s')
> > ],
> > ""
> > )
> > }
> -> """)
> +> """ % globals())
> > f.close()
> > EOF
> $ S=`cat s`
> @@ -52,9 +52,11 @@
>
> options:
>
> - -s --opt1 \xe7\x9f\xad\xe5\x90\x8d short width (esc)
> - -m --opt2 MIDDLE_ middle width
> - -l --opt3 \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d long width (esc)
> + -s --opt1 \xe7\x9f\xad\xe5\x90\x8d short width \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d \xe7\x9f\xad\xe5\x90\x8d (esc)
> + -m --opt2 MIDDLE_ middle width MIDDLE_ MIDDLE_ MIDDLE_ MIDDLE_ MIDDLE_
> + MIDDLE_ MIDDLE_ MIDDLE_
> + -l --opt3 \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d long width \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d (esc)
> + \xe9\x95\xb7\xe3\x81\x84\xe9\x95\xb7\xe3\x81\x84\xe5\x90\x8d\xe5\x89\x8d (esc)
-s --opt1 短名 short width 短名 短名 短名 短名 短名 短名 短名 短名
-m --opt2 MIDDLE_ middle width MIDDLE_ MIDDLE_ MIDDLE_ MIDDLE_ MIDDLE_
-l --opt3 長い長い名前 long width 長い長い名前 長い長い名前 長い長い名前
長い長い名前 長い長い名前 長い長い名前 長い長い名前
The last line is too long, apparently because some characters (by
definition length 1) has width 2 when rendered, but that is ignored by
the line-length-calculator. AFAICS that is different issue and something
this fix just happened to reveal: that d320e70442a5 ignored the not
self.break_long_words case.
/Mads
More information about the Mercurial-devel
mailing list