[PATCH 2 of 3] templater: replace jsonescape in main json templater (issue4926)

Yuya Nishihara yuya at tcha.org
Fri Jan 15 08:18:26 CST 2016


On Thu, 14 Jan 2016 12:49:27 -0600, Matt Mackall wrote:
> On Thu, 2016-01-14 at 22:12 +0900, Yuya Nishihara wrote:
> > > >  and JSON input (i.e. template string)
> > > > is a local-encoding text in general.
> > > 
> > > encoding.jsonescape (indirectly) knows about localstr objects, and thus
> > > recovers
> > > the original UTF-8 text to encode if it exists.
> > 
> > Yes, but localstr is mostly lost in templater,
> 
> Oh? Please elaborate.

I meant a localstr object would soon be converted to a plain str, loosing
original _utf8 bytes. The templater does bunch of string manipulation.

> >  and toutf8b() takes it as bytes,
> > not as local-encoding text.
> 
> You may be right, but according to the docstring, that means you've found
> a bug,

I thought it was when utf8b helper was introduced, but I don't think it is
a bug now. toutf8b() must keep the original bytes unmodified, so it can't
take input as local-encoding text. The only thing it can do is to escape
non-UTF8 part to U+DCxx.

> >  1. add option to escape all non-ASCII characters by encoding.jsonescape()
> 
> Ok.
> 
> >  2. add "|utf8" template filter to explicitly convert localstr|str to utf-8
> 
> Sounds suspicious. What's it going to do with filenames?

Filenames are the reason I'm going to introduce ugly |utf8 filter.

Because filenames are bytes in Mercurial world, {filename|json} should preserve
the original byte sequence, which means

  {x|json} -> '"' toutf8b(x) '"'

On the other hand, most template strings are in local encoding. Because |json
filter have to be byte-transparent to filenames, we need something to annotate
an input as a local string.

  {x|utf8|json} -> '"' toutf8b(fromlocal(x)) '"'

> >  3. change "|json" to take input as utf8b bytes (BC)
> 
> Sounds like a really big break from our encoding philosophy.

Yep, but |json was special in that it converts an input string to unicode.

Good news is "json" and "jsonescape" are undocumented, so they are considered
as internal filters.


More information about the Mercurial-devel mailing list