[PATCH 1 of 3 V3] py3: make a bytes version of getopt.getopt()

Pulkit Goyal 7895pulkit at gmail.com
Fri Dec 9 06:01:20 EST 2016


On Fri, Dec 9, 2016 at 4:16 PM, Yuya Nishihara <yuya at tcha.org> wrote:
> On Fri, 09 Dec 2016 00:42:37 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit at gmail.com>
>> # Date 1480986396 -19800
>> #      Tue Dec 06 06:36:36 2016 +0530
>> # Node ID e6e1c531a879c091caeaf7597744e98bcfbb41c9
>> # Parent  a2b053b8d31aa01b1dcae2d3001b060ff59e8a68
>> py3: make a bytes version of getopt.getopt()
>
>> +    # getopt.getopt() on Python 3 deals with unicodes internally so we cannot
>> +    # pass bytes there. Passing unicodes will result in unicodes as return
>> +    # values which we need to convert again to bytes.
>> +    def getoptb(args, shortlist, namelist):
>> +        # There are chances when args, shorlist or namelist variables can be
>> +        # unicodes, because maybe they are result of sys.argv like in statprof
>> +        # or some other reasons. So it's better to check instance rather than
>> +        # getting an AttributeError.
>> +        args = [a.decode('latin-1') if isinstance(a, bytes) else a
>> +                                    for a in args]
>> +        if isinstance(shortlist, bytes):
>> +            shortlist = shortlist.decode('latin-1')
>> +        namelist = [a.decode('latin-1') if isinstance(a, bytes) else a
>> +                                    for a in namelist]
>
> IMO, passing unicode variables is invalid use of getoptb(). If they were
> unicode type, it would be wrong to convert them back by .encode('latin-1').
> We have no idea what's the expected encoding.

Okay, that means getoptb() is espected to get bytes only and since we
will be passing bytes, we can safely use .encode('latin-1') because we
decoded them the same way before passing into getopt.getopt(). I will
resend a V4.

>> +        opts, args = getopt.getopt(args, shortlist, namelist)
>> +        # Returned value is always str, so no need to check instance.
>> +        opts = [(a[0].enocde('latin-1'), a[1].enocde('latin-1'))
>> +                            for a in opts]
>> +        args = [a.encode('latin-1') for a in args]
>> +        return opts, args


More information about the Mercurial-devel mailing list