[PATCH] minirst: make substitutions be unicode, as they are used on unicode data
Matt Mackall
mpm at selenic.com
Thu Jun 12 14:35:35 CDT 2014
On Sat, 2014-06-07 at 15:52 -0400, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <raf at durin42.com>
> # Date 1402170658 14400
> # Sat Jun 07 15:50:58 2014 -0400
> # Node ID efe7c79ca21edbdc2aeb83e60335c6d89294f3db
> # Parent 7afe70a5d2ad5b22c21ba9be849451407c1f337f
> minirst: make substitutions be unicode, as they are used on unicode data
>
> Caught this while working to try and get 'hg help' working in Python 3.
Huh. Something changed here then. Every once in a while I add this to
hg:
reload(sys)
sys.setdefaultencoding("undefined")
..and clean up the resulting breakage[1]. At one point, I fixed this
precise thing.. in the other direction:
http://www.selenic.com/hg/rev/0ad0ebe67815
Looks like foozy switched it back, and reintroduced the implicit unicode
coercion:
http://www.selenic.com/hg/rev/87bb6b7644f6
As a rule, we should stick to the rule "everything is a bytestring in
the local encoding" in the global scope and decode/encode things as
needed. Otherwise, we're always going to have confusion about the type
of arguments. So I'd rather do this in this case:
utext = text.decode(encoding.encoding)
for f, t in substs:
utext = utext.replace(f.decode("ascii"), t.decode("ascii"))
return utext.encode(encoding.encoding)
Also, we might consider encapsulating this encoding-aware-replace in
encoding.py, which will let us be smarter about this in the future: when
the encoding isn't the "bad" multibyte encoding where ASCII isn't a
proper subset (ie Shift-JIS), we can do a normal replace without
transcoding.
[1] Automatic promotion to Unicode is the biggest flaw in Py2
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list