[PATCH] minirst: make substitutions be unicode, as they are used on unicode data

Matt Mackall mpm at selenic.com
Thu Jun 12 14:35:35 CDT 2014


On Sat, 2014-06-07 at 15:52 -0400, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <raf at durin42.com>
> # Date 1402170658 14400
> #      Sat Jun 07 15:50:58 2014 -0400
> # Node ID efe7c79ca21edbdc2aeb83e60335c6d89294f3db
> # Parent  7afe70a5d2ad5b22c21ba9be849451407c1f337f
> minirst: make substitutions be unicode, as they are used on unicode data
> 
> Caught this while working to try and get 'hg help' working in Python 3.

Huh. Something changed here then. Every once in a while I add this to
hg:

reload(sys)
sys.setdefaultencoding("undefined")

..and clean up the resulting breakage[1]. At one point, I fixed this
precise thing.. in the other direction:

http://www.selenic.com/hg/rev/0ad0ebe67815

Looks like foozy switched it back, and reintroduced the implicit unicode
coercion:

http://www.selenic.com/hg/rev/87bb6b7644f6

As a rule, we should stick to the rule "everything is a bytestring in
the local encoding" in the global scope and decode/encode things as
needed. Otherwise, we're always going to have confusion about the type
of arguments. So I'd rather do this in this case:

    utext = text.decode(encoding.encoding)
    for f, t in substs:
        utext = utext.replace(f.decode("ascii"), t.decode("ascii"))
    return utext.encode(encoding.encoding)

Also, we might consider encapsulating this encoding-aware-replace in
encoding.py, which will let us be smarter about this in the future: when
the encoding isn't the "bad" multibyte encoding where ASCII isn't a
proper subset (ie Shift-JIS), we can do a normal replace without
transcoding.

[1] Automatic promotion to Unicode is the biggest flaw in Py2

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list