[PATCH] Obfuscate Me Gently

Fri Jun 30 13:47:43 CDT 2006

On Friday, 30 June 2006 at 11:32, Eric Hopper wrote:
> On Fri, Jun 30, 2006 at 11:15:41AM -0700, Brendan Cully wrote:
> > Any further thoughts on this? I'm not actually sure how to properly
> > encode utf-8 as &#..; entities, and I do think this approach will
> > provide complete coverage for e-mail addresses, since they don't
> > contain 8-bit characters anyway.
> 
> utf-8 and &# are different ways to do the same thing.  Both are about
> how to encode characters from the full Unicode character set using a
> more limited set of characters.

Yeah, I just didn't know how to get UCS out of those strings (I don't
do a lot of unicode). But the following patch seems to work.
-------------- next part --------------
# HG changeset patch
# User Brendan Cully <brendan at kublai.com>
# Node ID 86f71b55bc28d11d84ea8181b6e10641d6d45511
# Parent  b73552a00b209c8fcd71fd23fb990c5d05910010
Attempt to convert strings from UTF-8 before obfuscating them.
This should avoid accidentally splitting UTF-8 characters.

diff -r b73552a00b20 -r 86f71b55bc28 mercurial/templater.py

--- a/mercurial/templater.py	Mon Jun 26 16:47:24 2006 +0200
+++ b/mercurial/templater.py	Fri Jun 30 11:46:03 2006 -0700
@@ -230,6 +230,10 @@ def nl2br(text):
     return text.replace('\n', '<br/>\n')
 
 def obfuscate(text):
+    try:
+        text = unicode(text, 'utf-8')
+    except UnicodeDecodeError:
+        pass
     return ''.join(['&#%d;' % ord(c) for c in text])
 
 def domain(author):