Finding latent encoding bugs

Tue Oct 28 20:25:40 CDT 2008

On Wed, Oct 29, 2008 at 1:23 AM, Matt Mackall <mpm at selenic.com> wrote:
> Python likes to pretend that Unicode objects are just like strings, an
> idea that seems nice in practice, but generally results in code working
> for the developer but not in the field. Because Unicode strings can
> 'infect' normal strings, the bug can crop up far from where the Unicode
> string was introduced.
>
> So we try to follow three guidelines:
>
> (a) never pass Unicode objects inside hg, only utf-8 or local strings
> (b) explicitly transcode strings (with util.tolocal or fromlocal)
> (c) minimize transcoding by doing everything in the local encoding where
> possible, centralizing transcoding to the (very few) places that need it
>
> But because it's so easy for Unicode strings to sneak in when dealing
> with encodings and third-party code, I've come up with the following
> hack to quickly find all the spots where Unicode strings are getting
> transparently converted to regular strings or vice-versa, most of which
> are potential bugs if we encounter characters we can't convert:
>
[snip]
> Failed test-notify-changegroup: output changed

regarding this one, at least the apparent failure isn't from us, the traceback
is generated during the loading of emails.Headers.

regards,

Benoit