[PATCH] encoding: handle UTF-16 internal limit with fromutf8b (issue5031)
Yuya Nishihara
yuya at tcha.org
Mon Jan 11 09:05:49 CST 2016
On Sun, 10 Jan 2016 17:09:45 -0600, Matt Mackall wrote:
> # HG changeset patch
> # User Matt Mackall <mpm at selenic.com>
> # Date 1452200277 21600
> # Thu Jan 07 14:57:57 2016 -0600
> # Node ID 33819d463ddc7895bcd917d85b9ae1ab502c1547
> # Parent b8405d739149cdd6d8d9bd5e3dd2ad8487b1f09a
> encoding: handle UTF-16 internal limit with fromutf8b (issue5031)
>
> Default builds of Python have a Unicode type that isn't actually full
> Unicode but UTF-16, which encodes non-BMP codepoints to a pair of BMP
> codepoints with surrogate escaping. Since our UTF-8b hack escaping
> uses a plane that overlaps with the UTF-16 escaping system, this gets
> extra complicated. In addition, unichr() for codepoints greater than
> U+FFFF may not work either.
>
> This changes the code to reuse getutf8char to walk the byte string, so we
> only rely on Python for unpacking our U+DCxx characters.
Looks great. Pushed to the clowncopter, thanks.
> @@ -9,6 +9,8 @@
>
> import locale
> import os
> +import struct
> +import sys
Dropped these unused imports from the previous version.
More information about the Mercurial-devel
mailing list