[PATCH 01 of 10] py3: use unicode literals in encoding.py

Wed Aug 3 13:25:15 EDT 2016

At Wed, 3 Aug 2016 08:31:26 -0700,
Gregory Szorc wrote:
> 
> > On Aug 3, 2016, at 08:18, FUJIWARA Katsunori <foozy at lares.dti.ne.jp> wrote:
> > 
> > At Wed, 3 Aug 2016 13:33:12 +0100,
> > Jun Wu wrote:
> >> 
> >> I think we may want special handling things like os.environ in the
> >> transformer instead. IIUC the decision about using the transformer approach
> >> is to reduce the need of these kinds of fixups.
> > 
> > As a part of enabling demandimport on Python 3.x, I'm working to omit
> > code transformation for demandimport.py by changes like below:
> > 
> > diff -r cf6739a27b8f mercurial/__init__.py
> > --- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
> > +++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
> > @@ -310,6 +310,10 @@
> >         The added header has the form ``HG<VERSION>``. That is a literal
> >         ``HG`` with 2 binary bytes indicating the transformation version.
> >         """
> > +        _notransform = set([
> > +            'mercurial.demandimport',
> > +        ])
> > +
> >         def get_data(self, path):
> >             data = super(hgloader, self).get_data(path)
> > 
> > @@ -336,9 +340,10 @@
> > 
> >         def source_to_code(self, data, path):
> >             """Perform token transformation before compilation."""
> > -            buf = io.BytesIO(data)
> > -            tokens = tokenize.tokenize(buf.readline)
> > -            data = tokenize.untokenize(replacetokens(list(tokens)))
> > +            if self.name not in self._notransform:
> > +                buf = io.BytesIO(data)
> > +                tokens = tokenize.tokenize(buf.readline)
> > +                data = tokenize.untokenize(replacetokens(list(tokens)))
> >             # Python's built-in importer strips frames from exceptions raised
> >             # for this code. Unfortunately, that mechanism isn't extensible
> >             # and our frame will be blamed for the import failure. There
> > 
> > 
> > If (almost) all of operations with string literal in target source
> > code requires unicode-ness on Python 3.x, this omitting can reduce
> > adding explicit 'u' prefix to existing string literals.
> > 
> > For example, all operations with string literal in demandimport.py are
> > related to APIs below, which accept only unicode (as str) on Python
> > 3.x.
> > 
> >  - manipulate module name
> >    split(), formatting with "%s", __contains__(), and so on
> >  - access to attributes by name
> >  - access to values in os.environ
> >  - access to values in sys.builtin_module_names
> > 
> > pycompat.py and i18n.py also seem to work with this omitting. At short
> > glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> > prefix might be needed, though)
> > 
> > How about this omitting ?
> 
> I can go both ways. On one hand, not doing the transformation is
> ideal: the transforming is a giant hack to make porting more
> manageable. On the other, consistency is also good. Having to
> remember which modules are transformed and which aren't could be
> painful.
> 
> I like the idea of something in the file that would tell the loader
> not to transform. And I think we have something already: "from
> __future__ import unicode_literals." Although that would use Unicode
> types everywhere, which isn't wanted when interfacing with certain
> Python APIs. So maybe we could throw a special comment at the top of
> the file? "# hgnotransform" or some such.
> 

Yeah, marking on file side is better than black (white?) list !

I'll try to work in that direction.

> > 
> > 
> >> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
> >>> # HG changeset patch
> >>> # User Pulkit Goyal <7895pulkit at gmail.com>
> >>> # Date 1470161385 -19800
> >>> #      Tue Aug 02 23:39:45 2016 +0530
> >>> # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
> >>> # Parent  73ff159923c1f05899c27238409ca398342d9ae0
> >>> py3: use unicode literals in encoding.py
> >>> 
> >>> The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
> >>> where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
> >>> updates the output of test-check-py3-compat.t at some places which was left unchanged.
> >>> 
> >>> This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
> >> _______________________________________________
> >> Mercurial-devel mailing list
> >> Mercurial-devel at mercurial-scm.org
> >> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> > 
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp