[PATCH 0 of 1 stable] Re: [PATCH stable] templatefilters: make json filter handle multibyte characters correctly

Yuya Nishihara yuya at tcha.org
Tue Aug 10 12:04:11 CDT 2010


Laurens Holst wrote:
> Op 10-08-10 17:25, Matt Mackall schreef:
> > On Tue, 2010-08-10 at 15:00 +0200, Dirkjan Ochtman wrote:
> >    
> >> On Tue, Aug 10, 2010 at 13:47, Patrick Mézard<pmezard at gmail.com>  wrote:
> >>      
> >>> The situation and use cases are really unclear to me. I thought at first the motivation for json in template filters came from Benjamin Smedberg (http://markmail.org/message/2zlxjatcqha4cdrs#query:json%20mercurial+page:1+mid:2zlxjatcqha4cdrs+state:results) as an exchange format, while annotate reveals it was introduced by you for the web graph view (and that Benjamin original need was similar).
> >>>
> >>> So, assuming browsers can deal with random byte strings encoded as Javascript literals, then it's fine by definition. But if json() is supposed to be used as an API/exchange format (and not only from (inline?) Javascript), I think we should enforce proper JSON format.
> >>>
> >>> I am reluctant to call JSON something which might not be.
> >>>        
> >> Okay, so yeah, I haven't been aware of context either. I needed
> >> something to put the data in the HTML, and JSON seemed like a good
> >> fit. We chose to implement our own templatefilter rather than
> >> importing simplejson because our code is much smaller (doesn't include
> >> parser, doesn't have to deal with edge cases). I also thought (and
> >> still think) that it might be beneficial for extensions that want to
> >> exchange data (I know Mozilla has some JSON-based hgweb pages), but I
> >> don't have a solution for the present problem, I'm afraid.
> >>
> >> Is the conclusion so far that using UTF-8 JSON breaks in a page that
> >> has a non-UTF-8 encoding?
> >>      
> > The spec allows \uXXXX, so I think we should be able to send pure ASCII
> > JSON to avoid any possible encoding confusion.
> >    
> 
> Given that the spec says the encoding is UTF-8, and if you let hgweb 
> send the correct headers, there should be no reason to do escaping. I 
> don’t think saying ‘It’s UTF-8 period’ is really more confusing than 
> saying ‘it’s ASCII’.
> 
>  From what I understand the JSON output is wrapped in a JavaScript 
> method call; JavaScript being processed as ISO-8859-1 by default, in 
> that case the server should be sending Content-Type: text/javascript; 
> charset=UTF-8 ought to resolve the encoding problems.

There's need not to set charset=UTF-8, because it's also the encoding
of repository files.
hgweb doesn't convert the encoding of files, so it's reasonable to set
hgweb's encoding to file encoding, not UTF-8.

Yuya,


More information about the Mercurial-devel mailing list