[PATCH 1 of 3 RFC] mercurial: implement a source transforming module loader on Python 3

timeless timeless at gmail.com
Mon May 16 11:31:28 EDT 2016


Fwiw, We already need some cache invalidation. Switching between Python 2.6
and 2.7 results in really bad outcomes. :)
On May 16, 2016 12:03 AM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1463370916 25200
> #      Sun May 15 20:55:16 2016 -0700
> # Node ID 7c5d1f8db9618f511f40bc4089145310671ca57b
> # Parent  f8b87a779c87586aa043bcd6030369715edfc9c1
> mercurial: implement a source transforming module loader on Python 3
>
> The most painful part of ensuring Python code runs on both Python 2
> and 3 is string encoding. Making this difficult is that string
> literals in Python 2 are bytes and string literals in Python 3 are
> unicode. So, to ensure consistent types are used, you have to
> use "from __future__ import unicode_literals" and/or prefix literals
> with their type (e.g. b'foo' or u'foo').
>
> Nearly every string in Mercurial is bytes. So, to use the same source
> code on both Python 2 and 3 would require prefixing nearly every
> string literal with "b" to make it a byte literal. This is ugly and
> not something mpm is willing to do.
>
> This patch implements a custom module loader on Python 3 that performs
> source transformation to convert string literals (unicode in Python 3)
> to byte literals. In effect, it changes Python 3's string literals to
> behave like Python 2's.
>
> The module loader is only used on mercurial.* and hgext.* modules.
>
> The loader works by tokenizing the loaded source and replacing
> "string" tokens if necessary. The modified token stream is
> untokenized back to source and loaded like normal. This does add some
> overhead. However, this all occurs before caching. So .pyc files should
> cache the version with byte literals.
>
> This patch isn't suitable for checkin. There are a few deficiencies,
> including that changes to the loader won't result in the cache
> being invalidated. As part of testing this, I've had to manually
> blow away __pycache__ directories. We'll likely need to hack up
> cache checking as well so caching is invalidated when
> mercurial/__init__.py changes. This is going to be ugly.
>
> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
> --- a/mercurial/__init__.py
> +++ b/mercurial/__init__.py
> @@ -139,14 +139,89 @@ class hgimporter(object):
>              if not modinfo:
>                  raise ImportError('could not find mercurial module %s' %
>                                    name)
>
>          mod = imp.load_module(name, *modinfo)
>          sys.modules[name] = mod
>          return mod
>
> +if sys.version_info[0] >= 3:
> +    from . import pure
> +    import importlib
> +    import io
> +    import token
> +    import tokenize
> +
> +    class hgpathentryfinder(importlib.abc.PathEntryFinder):
> +        """A sys.meta_path finder."""
> +        def find_spec(self, fullname, path, target=None):
> +            # Our custom loader rewrites source code and Python code
> +            # that doesn't belong to Mercurial doesn't expect this.
> +            if not fullname.startswith(('mercurial.', 'hgext.')):
> +                return None
> +
> +            # This assumes Python 3 doesn't support loading C modules.
> +            if fullname in _dualmodules:
> +                stem = fullname.split('.')[-1]
> +                fullname = 'mercurial.pure.%s' % stem
> +                target = pure
> +                assert len(path) == 1
> +                path = [os.path.join(path[0], 'pure')]
> +
> +            # Try to find the module using other registered finders.
> +            spec = None
> +            for finder in sys.meta_path:
> +                if finder == self:
> +                    continue
> +
> +                spec = finder.find_spec(fullname, path, target=target)
> +                if spec:
> +                    break
> +
> +            if not spec:
> +                return None
> +
> +            if fullname.startswith('mercurial.pure.'):
> +                spec.name = spec.name.replace('.pure.', '.')
> +
> +            # TODO need to support loaders from alternate specs, like zip
> +            # loaders.
> +            spec.loader = hgloader(spec.name, spec.origin)
> +            return spec
> +
> +    def replacetoken(t):
> +        if t.type == token.STRING:
> +            s = t.string
> +
> +            # If a docstring, keep it as a string literal.
> +            if s[0:3] in ("'''", '"""'):
> +                return t
> +
> +            if s[0] not in ("'", '"'):
> +                return t
> +
> +            # String literal. Prefix to make a b'' string.
> +            return tokenize.TokenInfo(t.type, 'b%s' % s, t.start, t.end,
> t.line)
> +
> +        return t
> +
> +    class hgloader(importlib.machinery.SourceFileLoader):
> +        """Custom module loader that transforms source code.
> +
> +        When the source code is converted to code, we first transform
> +        string literals to byte literals using the tokenize API.
> +        """
> +        def source_to_code(self, data, path):
> +            buf = io.BytesIO(data)
> +            tokens = tokenize.tokenize(buf.readline)
> +            data = tokenize.untokenize(replacetoken(t) for t in tokens)
> +            return super(hgloader, self).source_to_code(data, path)
> +
>  # We automagically register our custom importer as a side-effect of
> loading.
>  # This is necessary to ensure that any entry points are able to import
>  # mercurial.* modules without having to perform this registration
> themselves.
> -if not any(isinstance(x, hgimporter) for x in sys.meta_path):
> -    # meta_path is used before any implicit finders and before sys.path.
> -    sys.meta_path.insert(0, hgimporter())
> +if sys.version_info[0] >= 3:
> +    sys.meta_path.insert(0, hgpathentryfinder())
> +else:
> +    if not any(isinstance(x, hgimporter) for x in sys.meta_path):
> +        # meta_path is used before any implicit finders and before
> sys.path.
> +        sys.meta_path.insert(0, hgimporter())
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160516/5fa4ad2f/attachment.html>


More information about the Mercurial-devel mailing list