[PATCH 5 of 6 py3] dispatch: enforce bytes when converting boolean flags to config items

Yuya Nishihara yuya at tcha.org
Wed Mar 8 10:43:05 EST 2017


On Wed, 8 Mar 2017 00:48:58 -0800, Gregory Szorc wrote:
> > On Mar 7, 2017, at 19:08, Augie Fackler <raf at durin42.com> wrote:
> >> On Mar 7, 2017, at 22:07, Durham Goode <durham at fb.com> wrote:
> >>> On 3/7/17 8:25 AM, Augie Fackler wrote:
> >>> # HG changeset patch
> >>> # User Augie Fackler <raf at durin42.com>
> >>> # Date 1488570207 18000
> >>> #      Fri Mar 03 14:43:27 2017 -0500
> >>> # Node ID 4801067dee2c77ff4e720c931d8b19cf32515beb
> >>> # Parent  a6e8bb19707e0c7505ccfdf44f7e1b19a0f65d48
> >>> dispatch: enforce bytes when converting boolean flags to config items
> >>> 
> >>> This fixes --verbose on Python 3.
> >>> 
> >>> diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
> >>> --- a/mercurial/dispatch.py
> >>> +++ b/mercurial/dispatch.py
> >>> @@ -744,6 +744,8 @@ def _dispatch(req):
> >>>        if options['verbose'] or options['debug'] or options['quiet']:
> >>>            for opt in ('verbose', 'debug', 'quiet'):
> >>>                val = str(bool(options[opt]))
> >>> +                if pycompat.ispy3:
> >>> +                    val = val.encode('latin1')
> >> 
> >> Should we have a util function for turning str() output into bytes? Or event a strbytes() function?  On py2 it could just return str.  My encoding knowledge is approximately zero, which is why I'd love to be able to choose from some easy functions like `util.tobytesfromstr()` instead of knowing that encode('latin1') is how I get ascii bytes.
> > 
> > I'm not sure - in this case I knew latin1 was safe because it's the repr() of a bool, but I don't know if there's a general-purpose solution possible here. Yuya might have an idea though?
> 
> The latin1 encoding is an identity encoding: it will pass byte values through unchanged without any validation. This is useful for coercing certain byte sequences to different Python types. I've used it to round trip raw byte sequences to a Unicode type then back out to bytes again to placate some API that insists on speaking Unicode.
> 
> I'm not 100% sure what happens when you .encode('latin1') a unicode type that has actual Unicode in it. I'm guessing the internal  buffer will be emitted. And, after PEP 393, the internal representation can be stored a few different ways. If that is the behavior, then .encode('latin1') on any "unknown" value isn't "safe."

Yes. bytes.decode('latin1') is exception free, but unicode.encode('latin1')
isn't as unicode has much larger space than latin1.

I'll send a PoC patch of utility function.


More information about the Mercurial-devel mailing list