[PATCH 2 of 2 stable] util: fix ellipsis() not to break multi-byte sequence (issue2564)

Yuya Nishihara yuya at tcha.org
Thu Dec 23 23:42:23 CST 2010


Matt Mackall wrote:
> On Fri, 2010-12-24 at 01:37 +0900, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya at tcha.org>
> > # Date 1293121040 -32400
> > # Branch stable
> > # Node ID 2ab92b58076868e42c632828b2487cabe2823e8e
> > # Parent  0ed736fe75b467ad9191f2ef52129992381659e5
> > util: fix ellipsis() not to break multi-byte sequence (issue2564)
> > 
> > diff --git a/mercurial/util.py b/mercurial/util.py
> > --- a/mercurial/util.py
> > +++ b/mercurial/util.py
> > @@ -1202,10 +1202,13 @@ def email(author):
> >  
> >  def ellipsis(text, maxlength=400):
> >      """Trim string to at most maxlength (default: 400) characters."""
> > -    if len(text) <= maxlength:
> > +    utext = encoding.fromlocal(text).decode('utf-8')
> 
> This assumes 'text' is in utf-8, which is not how strings in Mercurial
> generally work:

It converts 'text' to utf-8-encoded string before .decode('utf-8').

And IMO it's better than

    text.decode(encoding.encoding, 'replace')

because encoding.fromlocal() has the ability of lossless conversion.

> http://mercurial.selenic.com/wiki/EncodingStrategy

Regards,

Yuya


More information about the Mercurial-devel mailing list