[PATCH 01 of 10] py3: use unicode literals in encoding.py

Wed Aug 3 11:18:19 EDT 2016

At Wed, 3 Aug 2016 13:33:12 +0100,
Jun Wu wrote:
> 
> I think we may want special handling things like os.environ in the
> transformer instead. IIUC the decision about using the transformer approach
> is to reduce the need of these kinds of fixups.

As a part of enabling demandimport on Python 3.x, I'm working to omit
code transformation for demandimport.py by changes like below:

diff -r cf6739a27b8f mercurial/__init__.py

--- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
+++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
@@ -310,6 +310,10 @@
         The added header has the form ``HG<VERSION>``. That is a literal
         ``HG`` with 2 binary bytes indicating the transformation version.
         """
+        _notransform = set([
+            'mercurial.demandimport',
+        ])
+
         def get_data(self, path):
             data = super(hgloader, self).get_data(path)

@@ -336,9 +340,10 @@

         def source_to_code(self, data, path):
             """Perform token transformation before compilation."""
-            buf = io.BytesIO(data)
-            tokens = tokenize.tokenize(buf.readline)
-            data = tokenize.untokenize(replacetokens(list(tokens)))
+            if self.name not in self._notransform:
+                buf = io.BytesIO(data)
+                tokens = tokenize.tokenize(buf.readline)
+                data = tokenize.untokenize(replacetokens(list(tokens)))
             # Python's built-in importer strips frames from exceptions raised
             # for this code. Unfortunately, that mechanism isn't extensible
             # and our frame will be blamed for the import failure. There


If (almost) all of operations with string literal in target source
code requires unicode-ness on Python 3.x, this omitting can reduce
adding explicit 'u' prefix to existing string literals.

For example, all operations with string literal in demandimport.py are
related to APIs below, which accept only unicode (as str) on Python
3.x.

  - manipulate module name
    split(), formatting with "%s", __contains__(), and so on
  - access to attributes by name
  - access to values in os.environ
  - access to values in sys.builtin_module_names

pycompat.py and i18n.py also seem to work with this omitting. At short
glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
prefix might be needed, though)

How about this omitting ?


> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
> > # HG changeset patch
> > # User Pulkit Goyal <7895pulkit at gmail.com>
> > # Date 1470161385 -19800
> > #      Tue Aug 02 23:39:45 2016 +0530
> > # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
> > # Parent  73ff159923c1f05899c27238409ca398342d9ae0
> > py3: use unicode literals in encoding.py
> > 
> > The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
> > where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
> > updates the output of test-check-py3-compat.t at some places which was left unchanged.
> > 
> > This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp