Bug in description handling

Martijn Pieters mj at zopatista.com
Fri Aug 13 08:42:15 CDT 2010


I've found the following error in the handling of a description UTF-8
text, when passed through changelog.add.

Given a line ending in the UTF-8 character '\xc3\x85' (the letter Å),
.rstrip() on this UTF-8 encoded bytestring will remove the \x85 byte
as that's a control code in ASCII. This means that many descriptions
ending in non-ASCII characters run the risk of being corrupted,
because the description is being rstripped without decoding it to
Unicode first:

      # strip trailing whitespace and leading and trailing empty lines
      desc = '\n'.join([l.rstrip() for l in desc.splitlines()]).strip('\n')

(from changelog.py lines 212-213 in my copy).

Could we come up with a changelog stripping method that operates on
the decoded unicode instead?

-- 
Martijn Pieters


More information about the Mercurial-devel mailing list