[PATCH 0 of 1] replace Python standard textwrap by MBCS sensitive one for i18n text

FUJIWARA Katsunori fujiwara at ascade.co.jp
Sun May 16 06:15:26 CDT 2010

Mercurial has problem around text wrapping/filling in MBCS encoding
environment, because standard 'textwrap' module of Python can not
treat it correctly. It splits byte sequence for one character into two

I wrote this patch to replace Python standard textwrap by MBCS
sensitive one.

# this can be applied only on default(= non-stable),
# because diff-context is changed in minirst.py

This seems to work correctly, but I worry about determining column
width of east asian characters for unicode.

# http://www.unicode.org/reports/tr11/

According to unicode specification, result of "east asian width" are:

   W(ide), N(arrow), F(ull-width), H(alf-width), A(mbiguous)

W/N/F/H can be always recognized as 2/1/2/1 bytes in byte sequence,
but 'A' can not. Size of 'A' depends on language in which it is used.

Unicode specification says:

   If the context(= language) cannot be established reliably they
   should be treated as narrow characters by default

but many of 'A' characters are full-width, at least, in Japanese

In this patch, I introduce environment variable 'HGUCACWIDTH' to
determine UniCode Ambiguous Character WIDTH.

If there are any other easy (and appropriate) ways to determine it in
Python code, please teach me it !

If there are few languages other than Japanese which require 2(or
more) bytes for 'A' character, managing language lookup table also
seems to be reasonable in use and maintenance.

