[PATCH STABLE] phabricator: convert unicode to binary when writing patches

Yuya Nishihara yuya at tcha.org
Fri Jul 28 09:21:22 EDT 2017


On Thu, 27 Jul 2017 23:54:34 -0700, Gregory Szorc wrote:
> On Thu, Jul 27, 2017 at 12:03 PM, Jun Wu <quark at fb.com> wrote:
> 
> > # HG changeset patch
> > # User Jun Wu <quark at fb.com>
> > # Date 1501182181 25200
> > #      Thu Jul 27 12:03:01 2017 -0700
> > # Branch stable
> > # Node ID e3b08c0b6d7ca92f0d8b741585338b68ed0d8c48
> > # Parent  c5607b65fcb8cf5b789c49a8cf4fecfe83931727
> > phabricator: convert unicode to binary when writing patches
> >
> > This is a quick fix to make `hg phabread D189` work.
> >
> > It seems we might want to replace all `r''` to `u''`, and add more
> > `encoding.*to*` to be more explicit when interacting with `json` module.
> >
> 
> This brings back bad memories for me:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1249004#c5
> 
> Essentially, since pretty much everything in the world tries to ensure that
> JSON is UTF-8, it is not uncommon to have non-UTF-8 byte sequences mangled
> somewhere in a JSON exchange. After trying to coerce Python's stdlib json
> module on both the client and server, I figured it was best to side-step
> the issue by base64 encoding the diff bytes before they hit the JSON layer.
> JSON just isn't a good format for shipping binary around. The problem is
> compounded by Python's stdlib implementation making certain choices.
> 
> I'm curious what will happen if you attempt to import
> https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98 into Phabricator.
> The bytes between the quotes (0x22) are supposed to be 0xffff7e.

Phabricator has an open issue for that.

https://secure.phabricator.com/T8669
https://secure.phabricator.com/T5955

JSON is horrible choice for serializing a text (in Unix sense.) So is
Python 3.

> > -        write(('%s%s\n%s') % (header, desc, body))
> > +        content = '%s%s\n%s' % (header, desc, body)
> > +        write(encoding.unitolocal(content))

Strictly speaking, encoding.unitolocal(header + desc) makes sense, but
body doesn't.


More information about the Mercurial-devel mailing list