[PATCH stable] templatefilters: make json filter handle multibyte characters correctly

Patrick Mézard pmezard at gmail.com
Sun Aug 8 12:11:23 CDT 2010


Le 08/08/10 18:54, Yuya Nishihara a écrit :
> Patrick Mézard wrote:
>> Le 07/08/10 09:36, Yuya Nishihara a écrit :
>>> # HG changeset patch
>>> # User Yuya Nishihara <yuya at tcha.org>
>>> # Date 1281166036 -32400
>>> # Branch stable
>>> # Node ID 0e36aafcca8fedbf60e05b985d5f6426045c8e28
>>> # Parent  36e25f25dec11e68fc3240326999c02b3879ab10
>>> templatefilters: make json filter handle multibyte characters correctly
>>>
>>> It aims to fix javascript error of hgweb's graph view in Japanese 'cp932'
>>> encoding.
>>>
>>> 'cp932' contains multibyte characters ending with '\x5c' (backslash),
>>> e.g. '\x94\x5c' for Japanese Kanji 'Noh'.
>>> Due to json filter escapes '\' to '\\', multibyte string ending with
>>> '\x5c' is translated to "xxx\", resulting javascript parse error on
>>> a web browser.
>>>
>>> This patch changes json() to pass unicode to jsonescape().
>>>
>>> diff --git a/mercurial/templatefilters.py b/mercurial/templatefilters.py
>>> --- a/mercurial/templatefilters.py
>>> +++ b/mercurial/templatefilters.py
>>> @@ -156,9 +156,13 @@ def json(obj):
>>>      elif isinstance(obj, int) or isinstance(obj, float):
>>>          return str(obj)
>>>      elif isinstance(obj, str):
>>> -        return '"%s"' % jsonescape(obj)
>>> +        try:
>>> +            return '"%s"' % jsonescape(unicode(
>>> +                obj, encoding.encoding)).encode(encoding.encoding)
>>> +        except (UnicodeEncodeError, UnicodeDecodeError):
>>> +            return '"%s"' % jsonescape(obj)
>>
>> So, if we fail to decode/encode the string, we still may generate an invalid
>> JSON string, right? Shouldn't we "unicode(obj, encoding.encoding, 'replace')"
>> or something similar instead?
> 
> If we can assume that the encoding is correctly setup, 'replace' seems better.

Better than the other error handling modes ("strict", "ignore", etc.) or better than the original version you posted? It's not clear to me if you think using 'replace' is better in general or only if we make the "correctly setup encoding" assumption, ie, "should I edit your patch or not ?" :-)

If the encoding is not correctly set, json() will likely output a bunch of U+FFFE (or whatever it is) but not break the JSON format. I don't think we can do better than that, short of completely failing to encode the input string.

--
Patrick Mézard


More information about the Mercurial-devel mailing list