[PATCH 01 of 10] py3: use unicode literals in encoding.py

Gregory Szorc gregory.szorc at gmail.com
Wed Aug 3 11:31:26 EDT 2016



> On Aug 3, 2016, at 08:18, FUJIWARA Katsunori <foozy at lares.dti.ne.jp> wrote:
> 
> At Wed, 3 Aug 2016 13:33:12 +0100,
> Jun Wu wrote:
>> 
>> I think we may want special handling things like os.environ in the
>> transformer instead. IIUC the decision about using the transformer approach
>> is to reduce the need of these kinds of fixups.
> 
> As a part of enabling demandimport on Python 3.x, I'm working to omit
> code transformation for demandimport.py by changes like below:
> 
> diff -r cf6739a27b8f mercurial/__init__.py
> --- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
> +++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
> @@ -310,6 +310,10 @@
>         The added header has the form ``HG<VERSION>``. That is a literal
>         ``HG`` with 2 binary bytes indicating the transformation version.
>         """
> +        _notransform = set([
> +            'mercurial.demandimport',
> +        ])
> +
>         def get_data(self, path):
>             data = super(hgloader, self).get_data(path)
> 
> @@ -336,9 +340,10 @@
> 
>         def source_to_code(self, data, path):
>             """Perform token transformation before compilation."""
> -            buf = io.BytesIO(data)
> -            tokens = tokenize.tokenize(buf.readline)
> -            data = tokenize.untokenize(replacetokens(list(tokens)))
> +            if self.name not in self._notransform:
> +                buf = io.BytesIO(data)
> +                tokens = tokenize.tokenize(buf.readline)
> +                data = tokenize.untokenize(replacetokens(list(tokens)))
>             # Python's built-in importer strips frames from exceptions raised
>             # for this code. Unfortunately, that mechanism isn't extensible
>             # and our frame will be blamed for the import failure. There
> 
> 
> If (almost) all of operations with string literal in target source
> code requires unicode-ness on Python 3.x, this omitting can reduce
> adding explicit 'u' prefix to existing string literals.
> 
> For example, all operations with string literal in demandimport.py are
> related to APIs below, which accept only unicode (as str) on Python
> 3.x.
> 
>  - manipulate module name
>    split(), formatting with "%s", __contains__(), and so on
>  - access to attributes by name
>  - access to values in os.environ
>  - access to values in sys.builtin_module_names
> 
> pycompat.py and i18n.py also seem to work with this omitting. At short
> glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> prefix might be needed, though)
> 
> How about this omitting ?

I can go both ways. On one hand, not doing the transformation is ideal: the transforming is a giant hack to make porting more manageable. On the other, consistency is also good. Having to remember which modules are transformed and which aren't could be painful.

I like the idea of something in the file that would tell the loader not to transform. And I think we have something already: "from __future__ import unicode_literals." Although that would use Unicode types everywhere, which isn't wanted when interfacing with certain Python APIs. So maybe we could throw a special comment at the top of the file? "# hgnotransform" or some such.

> 
> 
>> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
>>> # HG changeset patch
>>> # User Pulkit Goyal <7895pulkit at gmail.com>
>>> # Date 1470161385 -19800
>>> #      Tue Aug 02 23:39:45 2016 +0530
>>> # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
>>> # Parent  73ff159923c1f05899c27238409ca398342d9ae0
>>> py3: use unicode literals in encoding.py
>>> 
>>> The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
>>> where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
>>> updates the output of test-check-py3-compat.t at some places which was left unchanged.
>>> 
>>> This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> 
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list