[PATCH 0 of 1 stable] Re: [PATCH stable] templatefilters: make json filter handle multibyte characters correctly

Laurens Holst laurens.nospam at grauw.nl
Tue Aug 10 11:36:58 CDT 2010


Op 10-08-10 17:25, Matt Mackall schreef:
> On Tue, 2010-08-10 at 15:00 +0200, Dirkjan Ochtman wrote:
>    
>> On Tue, Aug 10, 2010 at 13:47, Patrick Mézard<pmezard at gmail.com>  wrote:
>>      
>>> The situation and use cases are really unclear to me. I thought at first the motivation for json in template filters came from Benjamin Smedberg (http://markmail.org/message/2zlxjatcqha4cdrs#query:json%20mercurial+page:1+mid:2zlxjatcqha4cdrs+state:results) as an exchange format, while annotate reveals it was introduced by you for the web graph view (and that Benjamin original need was similar).
>>>
>>> So, assuming browsers can deal with random byte strings encoded as Javascript literals, then it's fine by definition. But if json() is supposed to be used as an API/exchange format (and not only from (inline?) Javascript), I think we should enforce proper JSON format.
>>>
>>> I am reluctant to call JSON something which might not be.
>>>        
>> Okay, so yeah, I haven't been aware of context either. I needed
>> something to put the data in the HTML, and JSON seemed like a good
>> fit. We chose to implement our own templatefilter rather than
>> importing simplejson because our code is much smaller (doesn't include
>> parser, doesn't have to deal with edge cases). I also thought (and
>> still think) that it might be beneficial for extensions that want to
>> exchange data (I know Mozilla has some JSON-based hgweb pages), but I
>> don't have a solution for the present problem, I'm afraid.
>>
>> Is the conclusion so far that using UTF-8 JSON breaks in a page that
>> has a non-UTF-8 encoding?
>>      
> The spec allows \uXXXX, so I think we should be able to send pure ASCII
> JSON to avoid any possible encoding confusion.
>    

Given that the spec says the encoding is UTF-8, and if you let hgweb 
send the correct headers, there should be no reason to do escaping. I 
don’t think saying ‘It’s UTF-8 period’ is really more confusing than 
saying ‘it’s ASCII’.

 From what I understand the JSON output is wrapped in a JavaScript 
method call; JavaScript being processed as ISO-8859-1 by default, in 
that case the server should be sending Content-Type: text/javascript; 
charset=UTF-8 ought to resolve the encoding problems.

That’s IMO a more elegant solution than escaping everything, and it will 
be easier to read for a human, too.

~Grauw

-- 
~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, developer, Utrecht, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com



More information about the Mercurial-devel mailing list