Bug in description handling

Benoit Boissinot bboissin at gmail.com
Fri Aug 13 09:05:26 CDT 2010


On Fri, Aug 13, 2010 at 3:42 PM, Martijn Pieters <mj at zopatista.com> wrote:
> I've found the following error in the handling of a description UTF-8
> text, when passed through changelog.add.
>
> Given a line ending in the UTF-8 character '\xc3\x85' (the letter Å),
> .rstrip() on this UTF-8 encoded bytestring will remove the \x85 byte
> as that's a control code in ASCII. This means that many descriptions
> ending in non-ASCII characters run the risk of being corrupted,
> because the description is being rstripped without decoding it to
> Unicode first:
>
>      # strip trailing whitespace and leading and trailing empty lines
>      desc = '\n'.join([l.rstrip() for l in desc.splitlines()]).strip('\n')
>
> (from changelog.py lines 212-213 in my copy).
>
> Could we come up with a changelog stripping method that operates on
> the decoded unicode instead?
>

Can't reproduce here:
>>> '\x85'.rstrip()
'\x85'

Benoit


More information about the Mercurial-devel mailing list