[Bug 4926] New: Cannot roundtrip multibyte utf8 char with the templater json escaper
mercurial-bugs at selenic.com
mercurial-bugs at selenic.com
Mon Nov 2 13:36:17 UTC 2015
https://bz.mercurial-scm.org/show_bug.cgi?id=4926
Bug ID: 4926
Summary: Cannot roundtrip multibyte utf8 char with the
templater json escaper
Product: Mercurial
Version: unspecified
Hardware: PC
OS: Linux
Status: UNCONFIRMED
Severity: bug
Priority: wish
Component: templater
Assignee: bugzilla at selenic.com
Reporter: pierre-yves.david at ens-lyon.org
CC: mercurial-devel at selenic.com
Add this diff to the fuzzy testing of templatefilter.jsonescape:
diff --git a/tests/test-template-engine.t b/tests/test-template-engine.t
--- a/tests/test-template-engine.t
+++ b/tests/test-template-engine.t
@@ -51,10 +51,12 @@ Fuzzing the unicode escaper to ensure it
>>> from hypothesishelpers import *
>>> import mercurial.templatefilters as tf
>>> import json
>>> @check(st.text().map(lambda s: s.encode('utf-8')))
... def testtfescapeproducesvalidjson(text):
- ... json.loads('"' + tf.jsonescape(text) + '"')
+ ... uni = json.loads('"' + tf.jsonescape(text) + '"')
+ ... result = uni.encode('utf-8')
+ ... assert text == result, (text, result, uni)
#endif
$ cd ..
Produce failure for string like: '\xc2\x80'
+ AssertionError: ('\xc2\x80', '\xc3\x82\xc2\x80', u'\xc2\x80')
What seems to happen is that the '\xc2\x80' byte (single unicode char) are
encoded (using '\u####' syntax) as two unicode chars. The Json decodeur read
that as two different unicode char. Trying to retrieve the byte version
(re-encoding to utf8) then fails to produce the same byte as the input.
Our test may be faultly here (retrieving the byte should maybe be achieved
another way, but I've no idea what that other way would be).
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Mercurial-devel
mailing list