Finding latent encoding bugs

Wed Oct 29 07:20:20 CDT 2008

* Benoit Boissinot on Wednesday, October 29, 2008 at 02:25:40 +0100
> On Wed, Oct 29, 2008 at 1:23 AM, Matt Mackall <mpm at selenic.com> wrote:
>> Python likes to pretend that Unicode objects are just like strings, an
>> idea that seems nice in practice, but generally results in code working
>> for the developer but not in the field. Because Unicode strings can
>> 'infect' normal strings, the bug can crop up far from where the Unicode
>> string was introduced.
>> 
>> So we try to follow three guidelines:
>> 
>> (a) never pass Unicode objects inside hg, only utf-8 or local strings
>> (b) explicitly transcode strings (with util.tolocal or fromlocal)
>> (c) minimize transcoding by doing everything in the local encoding where
>> possible, centralizing transcoding to the (very few) places that need it
>> 
>> But because it's so easy for Unicode strings to sneak in when dealing
>> with encodings and third-party code, I've come up with the following
>> hack to quickly find all the spots where Unicode strings are getting
>> transparently converted to regular strings or vice-versa, most of which
>> are potential bugs if we encounter characters we can't convert:
>> 
> [snip]
>> Failed test-notify-changegroup: output changed
> 
> regarding this one, at least the apparent failure isn't from us, the traceback
> is generated during the loading of emails.Headers.

Any manner of loading one of the email modules causes this.

The error in test-keyword is caused by running notify.

The error in highlight is caused by loading lexer.py from
pygmentize.

c
-- 
  Was heißt hier Dogma, ich bin Underdogma!
[ What the hell do you mean dogma, I am underdogma. ]

_F R E E_  _V I D E O S_  -->>  http://www.blacktrash.org/underdogma/