[PATCH 1 of 2] encoding: make utf8b encoder more robust (issue4927)
Yuya Nishihara
yuya at tcha.org
Fri Nov 6 08:03:56 CST 2015
On Wed, 4 Nov 2015 22:46:26 +0900, Yuya Nishihara wrote:
> We might be possible to use the error handler to map invalid chars to \udcxx,
> but I've never tried it and it seems the handler table is global.
>
> https://docs.python.org/2.7/library/codecs.html#codecs.register_error
Catching error won't work if the source string contains a valid surrogate-
encoded sequence.
>>> s = u'\udc00'.encode('utf-8')
>>> encoding.toutf8b(s)
'\xed\xb0\x80' # should be '\xed\xb3\xad\xed\xb2\xb0\xed\xb2\x80' ?
>>> encoding.fromutf8b(encoding.toutf8b(s))
'\x00'
More information about the Mercurial-devel
mailing list