[PATCH 01 of 10] py3: use unicode literals in encoding.py
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Wed Aug 3 13:25:15 EDT 2016
At Wed, 3 Aug 2016 08:31:26 -0700,
Gregory Szorc wrote:
>
> > On Aug 3, 2016, at 08:18, FUJIWARA Katsunori <foozy at lares.dti.ne.jp> wrote:
> >
> > At Wed, 3 Aug 2016 13:33:12 +0100,
> > Jun Wu wrote:
> >>
> >> I think we may want special handling things like os.environ in the
> >> transformer instead. IIUC the decision about using the transformer approach
> >> is to reduce the need of these kinds of fixups.
> >
> > As a part of enabling demandimport on Python 3.x, I'm working to omit
> > code transformation for demandimport.py by changes like below:
> >
> > diff -r cf6739a27b8f mercurial/__init__.py
> > --- a/mercurial/__init__.py Wed Aug 03 22:34:54 2016 +0900
> > +++ b/mercurial/__init__.py Wed Aug 03 22:47:17 2016 +0900
> > @@ -310,6 +310,10 @@
> > The added header has the form ``HG<VERSION>``. That is a literal
> > ``HG`` with 2 binary bytes indicating the transformation version.
> > """
> > + _notransform = set([
> > + 'mercurial.demandimport',
> > + ])
> > +
> > def get_data(self, path):
> > data = super(hgloader, self).get_data(path)
> >
> > @@ -336,9 +340,10 @@
> >
> > def source_to_code(self, data, path):
> > """Perform token transformation before compilation."""
> > - buf = io.BytesIO(data)
> > - tokens = tokenize.tokenize(buf.readline)
> > - data = tokenize.untokenize(replacetokens(list(tokens)))
> > + if self.name not in self._notransform:
> > + buf = io.BytesIO(data)
> > + tokens = tokenize.tokenize(buf.readline)
> > + data = tokenize.untokenize(replacetokens(list(tokens)))
> > # Python's built-in importer strips frames from exceptions raised
> > # for this code. Unfortunately, that mechanism isn't extensible
> > # and our frame will be blamed for the import failure. There
> >
> >
> > If (almost) all of operations with string literal in target source
> > code requires unicode-ness on Python 3.x, this omitting can reduce
> > adding explicit 'u' prefix to existing string literals.
> >
> > For example, all operations with string literal in demandimport.py are
> > related to APIs below, which accept only unicode (as str) on Python
> > 3.x.
> >
> > - manipulate module name
> > split(), formatting with "%s", __contains__(), and so on
> > - access to attributes by name
> > - access to values in os.environ
> > - access to values in sys.builtin_module_names
> >
> > pycompat.py and i18n.py also seem to work with this omitting. At short
> > glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> > prefix might be needed, though)
> >
> > How about this omitting ?
>
> I can go both ways. On one hand, not doing the transformation is
> ideal: the transforming is a giant hack to make porting more
> manageable. On the other, consistency is also good. Having to
> remember which modules are transformed and which aren't could be
> painful.
>
> I like the idea of something in the file that would tell the loader
> not to transform. And I think we have something already: "from
> __future__ import unicode_literals." Although that would use Unicode
> types everywhere, which isn't wanted when interfacing with certain
> Python APIs. So maybe we could throw a special comment at the top of
> the file? "# hgnotransform" or some such.
>
Yeah, marking on file side is better than black (white?) list !
I'll try to work in that direction.
> >
> >
> >> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
> >>> # HG changeset patch
> >>> # User Pulkit Goyal <7895pulkit at gmail.com>
> >>> # Date 1470161385 -19800
> >>> # Tue Aug 02 23:39:45 2016 +0530
> >>> # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
> >>> # Parent 73ff159923c1f05899c27238409ca398342d9ae0
> >>> py3: use unicode literals in encoding.py
> >>>
> >>> The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
> >>> where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
> >>> updates the output of test-check-py3-compat.t at some places which was left unchanged.
> >>>
> >>> This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
> >> _______________________________________________
> >> Mercurial-devel mailing list
> >> Mercurial-devel at mercurial-scm.org
> >> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> >
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori] foozy at lares.dti.ne.jp
>
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list