[PATCH 0 of 1] replace Python standard textwrap by MBCS sensitive one for i18n text

FUJIWARA Katsunori fujiwara at ascade.co.jp
Mon May 17 10:58:44 CDT 2010


Hi, Martin:

At Sun, 16 May 2010 17:33:50 +0200,
Martin Geisler wrote:

> > Mercurial has problem around text wrapping/filling in MBCS encoding
> > environment, because standard 'textwrap' module of Python can not
> > treat it correctly. It splits byte sequence for one character into two
> > lines.
> 
> Right, that's why we decode the bytestrings into Unicode strings in
> util.wrap -- I guess we should have used that all over the place in
> minirst too.
> 
> When that problem is solved, the problem of computing the length of the
> string remains. In your patch, you override _handle_long_word in the
> textwrap class. I don't think that is 100% correct: the original class
> will only call _handle_long_word when it detects that the chunk is long,
> i.e., when it has computed the length incorrectly and determined that
> the word is too large for the line width.

You are right. Line wrapping which is appropriate for almost all
languages seems to be difficult and large as a patch for Mercurial, so
this patch focuses only on MBCS safeness.

In fact, I can not imagine how strictly people expects help document
on tty to be wrapped, and do not know detail about line wrapping
method for other east asian languages :-)

So, this patch introduces "filltext"/"wraptext" hook points in
i18n.py for easy overriding.

> > Unicode specification says:
> >
> >    If the context(= language) cannot be established reliably they
> >    should be treated as narrow characters by default
> >
> > but many of 'A' characters are full-width, at least, in Japanese
> > environment.
> >
> > In this patch, I introduce environment variable 'HGUCACWIDTH' to
> > determine UniCode Ambiguous Character WIDTH.
> 
> Could we not just always treat them as full-width? That would mean that
> some strings are wrapped too soon, but I don't see that as a problem. It
> will only give the text a slightly more ragged appearance. The good
> thing is that we would avoid using another environment variable.

"Always treating them as full-width" seems to be good solution.

For people who likes stricted line wrapping, overriding
filltext/wraptext in mercurial/i18n module by custom one would be
better than new environment variable.

--------------------
[FUJIWARA Katsunori]      fujiwara at ascade.co.jp(foozy at lares.dti.ne.jp)



More information about the Mercurial-devel mailing list