[PATCH 2 of 2] py3: handle multiple arguments in .encode() and .decode()

Fri Oct 7 00:58:47 EDT 2016

On Wed, 05 Oct 2016 20:05:18 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit at gmail.com>
> # Date 1475596407 -19800
> #      Tue Oct 04 21:23:27 2016 +0530
> # Node ID 535c77a356a09c0319c9a794bdbec18e9ebb57b2
> # Parent  51e49c041614b463953b3973d5b58d8bbdcbbab3
> py3: handle multiple arguments in .encode() and .decode()
> 
> There is a case and more can be present where these functions have
> multiple arguments. Our transformer used to handle the first argument, so
> added a loop to handle more arguments if present.
> 
> diff -r 51e49c041614 -r 535c77a356a0 mercurial/__init__.py
> --- a/mercurial/__init__.py	Tue Oct 04 20:56:03 2016 +0530
> +++ b/mercurial/__init__.py	Tue Oct 04 21:23:27 2016 +0530
> @@ -278,24 +278,30 @@
>                  # .encode() and .decode() on str/bytes/unicode don't accept
>                  # byte strings on Python 3. Rewrite the token to include the
>                  # unicode literal prefix so the string transformer above doesn't
> -                # add the byte prefix.
> +                # add the byte prefix. The loop helps in handling multiple
> +                # arguments to them.
>                  if (fn in ('encode', 'decode') and
>                      prevtoken.type == token.OP and prevtoken.string == '.'):
>                      # (OP, '.')
>                      # (NAME, 'encode')
>                      # (OP, '(')
>                      # (STRING, 'utf-8')
> +                    # [(OP, ',')]
> +                    # [(STRING, 'ascii')]
>                      # (OP, ')')
> -                    try:
> -                        st = tokens[i + 2]
> -                        if (st.type == token.STRING and
> -                            st.string[0] in ("'", '"')):
> -                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
> -                                                    st.start, st.end, st.line)
> -                            tokens[i + 2] = rt
> -                    except IndexError:
> -                        pass
> -
> +                    j = i
> +                    while (tokens[j + 1].string != ')'):
> +                        try:
> +                            st = tokens[j + 2]
> +                            if (st.type == token.STRING and
> +                                st.string[0] in ("'", '"')):
> +                                rt = tokenize.TokenInfo(st.type,
> +                                    'u%s' % st.string,
> +                                        st.start, st.end, st.line)
> +                                tokens[j + 2] = rt
> +                        except IndexError:
> +                            pass

Perhaps IndexError could be raised at the first tokens[j + 1]. Since we have
"while", it could be written as j + 2 < len(tokens).

Also, we'll need to check the existence of ',' token.

>      # ``replacetoken`` or any mechanism that changes semantics of module
>      # loading is changed. Otherwise cached bytecode may get loaded without
>      # the new transformation mechanisms applied.
> -    BYTECODEHEADER = b'HG\x00\x02'
> +    BYTECODEHEADER = b'HG\x00\x04'

Just curious, why not '\x03' ?