Finding latent encoding bugs

Matt Mackall mpm at selenic.com
Tue Oct 28 19:23:56 CDT 2008


Python likes to pretend that Unicode objects are just like strings, an
idea that seems nice in practice, but generally results in code working
for the developer but not in the field. Because Unicode strings can
'infect' normal strings, the bug can crop up far from where the Unicode
string was introduced.

So we try to follow three guidelines:

(a) never pass Unicode objects inside hg, only utf-8 or local strings
(b) explicitly transcode strings (with util.tolocal or fromlocal)
(c) minimize transcoding by doing everything in the local encoding where
possible, centralizing transcoding to the (very few) places that need it

But because it's so easy for Unicode strings to sneak in when dealing
with encodings and third-party code, I've come up with the following
hack to quickly find all the spots where Unicode strings are getting
transparently converted to regular strings or vice-versa, most of which
are potential bugs if we encounter characters we can't convert:

diff -r dcf8b57b84b7 mercurial/util.py
--- a/mercurial/util.py	Tue Oct 28 15:04:47 2008 -0500
+++ b/mercurial/util.py	Tue Oct 28 17:07:02 2008 -0500
@@ -17,6 +17,10 @@
 import os, stat, threading, time, calendar, ConfigParser, locale, glob, osutil
 import imp
 
+dir(sys) # outsmart demand loader
+reload(sys) # undo site.py's del
+sys.setdefaultencoding('undefined')
+
 # Python compatibility
 
 try:

run-tests.py reports:

Failed test-convert-clonebranches: output changed
Failed test-convert-datesort: output changed and returned error code 255
Failed test-convert-hg-startrev: output changed
Failed test-convert-svn-sink: output changed
Failed test-doctest.py: output changed and returned error code 1
Failed test-highlight: output changed
Failed test-convert-svn-startrev: output changed
Failed test-patchbomb: output changed and returned error code 1
Failed test-convert: output changed and returned error code 1
Failed test-convert-hg-sink: output changed and returned error code 255
Failed test-convert-svn-branches: output changed
Failed test-convert-svn-tags: output changed and returned error code 255
Failed test-notify: output changed
Failed test-convert-filemap: output changed and returned error code 255
Failed test-convert-hg-svn: output changed and returned error code 1
Failed test-convert-svn-source: output changed
Failed test-keyword: output changed
Failed test-convert-hg-source: output changed
Failed test-convert-svn-move: output changed
Failed test-notify-changegroup: output changed
# Ran 300 tests, 12 skipped, 20 failed.

So it looks like we've got problems in convert (mostly SVN), patchbomb,
highlight, notify, and keyword.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list