[Bug 4926] New: Cannot roundtrip multibyte utf8 char with the templater json escaper

Mon Nov 2 13:36:17 UTC 2015

https://bz.mercurial-scm.org/show_bug.cgi?id=4926

            Bug ID: 4926
           Summary: Cannot roundtrip multibyte utf8 char with the
                    templater json escaper
           Product: Mercurial
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: UNCONFIRMED
          Severity: bug
          Priority: wish
         Component: templater
          Assignee: bugzilla at selenic.com
          Reporter: pierre-yves.david at ens-lyon.org
                CC: mercurial-devel at selenic.com

Add this diff to the fuzzy testing of templatefilter.jsonescape:

diff --git a/tests/test-template-engine.t b/tests/test-template-engine.t
--- a/tests/test-template-engine.t
+++ b/tests/test-template-engine.t
@@ -51,10 +51,12 @@ Fuzzing the unicode escaper to ensure it
   >>> from hypothesishelpers import *
   >>> import mercurial.templatefilters as tf
   >>> import json
   >>> @check(st.text().map(lambda s: s.encode('utf-8')))
   ... def testtfescapeproducesvalidjson(text):
-  ...     json.loads('"' + tf.jsonescape(text) + '"')
+  ...     uni = json.loads('"' + tf.jsonescape(text) + '"')
+  ...     result = uni.encode('utf-8')
+  ...     assert text == result, (text, result, uni)

 #endif

   $ cd ..

Produce failure for string like: '\xc2\x80'

+  AssertionError: ('\xc2\x80', '\xc3\x82\xc2\x80', u'\xc2\x80')

What seems to happen is that the '\xc2\x80' byte (single unicode char) are
encoded (using '\u####' syntax) as two unicode chars. The Json decodeur read
that as two different unicode char. Trying to retrieve the byte version
(re-encoding to utf8) then fails to produce the same byte as the input.

Our test may be faultly here (retrieving the byte should maybe be achieved
another way, but I've no idea what that other way would be).

-- 
You are receiving this mail because:
You are on the CC list for the bug.