[PATCH 07 of 11] py3: byteify the decoded JSON responses upon receipt in the LFS blobstore

Sat Feb 2 20:52:43 EST 2019

On Sat, 02 Feb 2019 13:18:37 -0500, Matt Harbison wrote:
> On Mon, 28 Jan 2019 06:59:44 -0500, Yuya Nishihara <yuya at tcha.org> wrote:
> 
> > On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison at yahoo.com>
> >> # Date 1548629295 18000
> >> #      Sun Jan 27 17:48:15 2019 -0500
> >> # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47
> >> # Parent  1d6f4c32abc28ea54e3d1d8487a1d773033aedf0
> >> py3: byteify the decoded JSON responses upon receipt in the LFS  
> >> blobstore
> >
> >> -        return response
> >> +        def encodestr(x):
> >> +            if isinstance(x, unicode):
> >
> > Fixed s/unicode/pycompat.unicode/ in flight.
> >
> >> +                return x.encode(u'utf-8')
> >> +            return x
> >> +
> >> +        return pycompat.rapply(encodestr, response)
> >
> > I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the  
> > JSON
> > had a filename for example, it wouldn't be always correct to encode data  
> > as
> > UTF-8.
> 
> Somewhere along the line, I got it in my head that the spec explicitly  
> said the JSON payload was utf-8 encoded.  Of course, I can't find that  
> now, and the test-lfs-test-server.t#git-server output doesn't have a  
> charset in the header for the JSON exchange.

JSON should be utf-8, but IIUC, we're translating JSON back to Mercurial
world. So if the response had a unicode string which Mercurial thinks is a
platform byte string, x.encode(u'utf-8') shouldn't be used. That's the point.