[PATCH 0 of 1 stable] Re: [PATCH stable] templatefilters: make json filter handle multibyte characters correctly
Laurens Holst
laurens.nospam at grauw.nl
Tue Aug 10 11:36:58 CDT 2010
Op 10-08-10 17:25, Matt Mackall schreef:
> On Tue, 2010-08-10 at 15:00 +0200, Dirkjan Ochtman wrote:
>
>> On Tue, Aug 10, 2010 at 13:47, Patrick Mézard<pmezard at gmail.com> wrote:
>>
>>> The situation and use cases are really unclear to me. I thought at first the motivation for json in template filters came from Benjamin Smedberg (http://markmail.org/message/2zlxjatcqha4cdrs#query:json%20mercurial+page:1+mid:2zlxjatcqha4cdrs+state:results) as an exchange format, while annotate reveals it was introduced by you for the web graph view (and that Benjamin original need was similar).
>>>
>>> So, assuming browsers can deal with random byte strings encoded as Javascript literals, then it's fine by definition. But if json() is supposed to be used as an API/exchange format (and not only from (inline?) Javascript), I think we should enforce proper JSON format.
>>>
>>> I am reluctant to call JSON something which might not be.
>>>
>> Okay, so yeah, I haven't been aware of context either. I needed
>> something to put the data in the HTML, and JSON seemed like a good
>> fit. We chose to implement our own templatefilter rather than
>> importing simplejson because our code is much smaller (doesn't include
>> parser, doesn't have to deal with edge cases). I also thought (and
>> still think) that it might be beneficial for extensions that want to
>> exchange data (I know Mozilla has some JSON-based hgweb pages), but I
>> don't have a solution for the present problem, I'm afraid.
>>
>> Is the conclusion so far that using UTF-8 JSON breaks in a page that
>> has a non-UTF-8 encoding?
>>
> The spec allows \uXXXX, so I think we should be able to send pure ASCII
> JSON to avoid any possible encoding confusion.
>
Given that the spec says the encoding is UTF-8, and if you let hgweb
send the correct headers, there should be no reason to do escaping. I
don’t think saying ‘It’s UTF-8 period’ is really more confusing than
saying ‘it’s ASCII’.
From what I understand the JSON output is wrapped in a JavaScript
method call; JavaScript being processed as ISO-8859-1 by default, in
that case the server should be sending Content-Type: text/javascript;
charset=UTF-8 ought to resolve the encoding problems.
That’s IMO a more elegant solution than escaping everything, and it will
be easier to read for a human, too.
~Grauw
--
~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, developer, Utrecht, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com
More information about the Mercurial-devel
mailing list