Bug in description handling

Martijn Pieters mj at zopatista.com
Fri Aug 13 14:14:56 CDT 2010


On Fri, Aug 13, 2010 at 18:38, Matt Mackall <mpm at selenic.com> wrote:
>> This appears to be caused by Mercurial calling locale.setlocale(), which affects strip() in 2.6 and earlier.
>
> Good catch - I just went through Python's SVN source browser and
> convinced myself this behavior was impossible in tip.
>
>> Given this little scriptlet:
>>
>> print [c for c in range(256) if chr(c) != chr(c).rstrip()]
>> import locale
>> locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8")
>> print [c for c in range(256) if chr(c) != chr(c).rstrip()]
>>
>> Python 2.6.5 outputs:
>>
>> [9, 10, 11, 12, 13, 32]
>> [9, 10, 11, 12, 13, 32, 133, 160]

Right, I just came to a similar conclusion, after pinning the problem
to the mercurial.encoding module.

So the workaround is to either using python 2.7 or to avoid the
affected characters at the end of a line (only a handful in the
latin-1 range of the unicode standard)

> And that's obviously a bug in Python because the bytes > 127 are going
> to be part of multibyte characters.
>
> Notably, we only call setlocale on OS X, so we can probably find a way
> to ditch this hack. But we'll probably need someone with a Mac (hint
> hint) to figure out how.

I can look into porting http://bugs.python.org/issue6202 for use in
Mercurial. In any case, getpreferredencoding has been fixed for Python
2.7, so this only needs to be backported for python 2.6 and earlier.

-- 
Martijn Pieters


More information about the Mercurial-devel mailing list