[PATCH 1 of 3 RFC] mercurial: implement a source transforming module loader on Python 3

Simon King simon at simonking.org.uk
Mon May 16 11:43:47 EDT 2016


I don't think that's supposed to happen, is it? Python should
automatically invalidate .pyc files based on a magic number that
changes when the format changes:

https://hg.python.org/cpython/file/2.7/Python/import.c#l31

Simon

On Mon, May 16, 2016 at 4:31 PM, timeless <timeless at gmail.com> wrote:
> Fwiw, We already need some cache invalidation. Switching between Python 2.6
> and 2.7 results in really bad outcomes. :)
>
> On May 16, 2016 12:03 AM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1463370916 25200
>> #      Sun May 15 20:55:16 2016 -0700
>> # Node ID 7c5d1f8db9618f511f40bc4089145310671ca57b
>> # Parent  f8b87a779c87586aa043bcd6030369715edfc9c1
>> mercurial: implement a source transforming module loader on Python 3
>>
>> The most painful part of ensuring Python code runs on both Python 2
>> and 3 is string encoding. Making this difficult is that string
>> literals in Python 2 are bytes and string literals in Python 3 are
>> unicode. So, to ensure consistent types are used, you have to
>> use "from __future__ import unicode_literals" and/or prefix literals
>> with their type (e.g. b'foo' or u'foo').
>>
>> Nearly every string in Mercurial is bytes. So, to use the same source
>> code on both Python 2 and 3 would require prefixing nearly every
>> string literal with "b" to make it a byte literal. This is ugly and
>> not something mpm is willing to do.
>>
>> This patch implements a custom module loader on Python 3 that performs
>> source transformation to convert string literals (unicode in Python 3)
>> to byte literals. In effect, it changes Python 3's string literals to
>> behave like Python 2's.
>>
>> The module loader is only used on mercurial.* and hgext.* modules.
>>
>> The loader works by tokenizing the loaded source and replacing
>> "string" tokens if necessary. The modified token stream is
>> untokenized back to source and loaded like normal. This does add some
>> overhead. However, this all occurs before caching. So .pyc files should
>> cache the version with byte literals.
>>
>> This patch isn't suitable for checkin. There are a few deficiencies,
>> including that changes to the loader won't result in the cache
>> being invalidated. As part of testing this, I've had to manually
>> blow away __pycache__ directories. We'll likely need to hack up
>> cache checking as well so caching is invalidated when
>> mercurial/__init__.py changes. This is going to be ugly.
>>
>> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
>> --- a/mercurial/__init__.py
>> +++ b/mercurial/__init__.py
>> @@ -139,14 +139,89 @@ class hgimporter(object):
>>              if not modinfo:
>>                  raise ImportError('could not find mercurial module %s' %
>>                                    name)
>>
>>          mod = imp.load_module(name, *modinfo)
>>          sys.modules[name] = mod
>>          return mod
>>
>> +if sys.version_info[0] >= 3:
>> +    from . import pure
>> +    import importlib
>> +    import io
>> +    import token
>> +    import tokenize
>> +
>> +    class hgpathentryfinder(importlib.abc.PathEntryFinder):
>> +        """A sys.meta_path finder."""
>> +        def find_spec(self, fullname, path, target=None):
>> +            # Our custom loader rewrites source code and Python code
>> +            # that doesn't belong to Mercurial doesn't expect this.
>> +            if not fullname.startswith(('mercurial.', 'hgext.')):
>> +                return None
>> +
>> +            # This assumes Python 3 doesn't support loading C modules.
>> +            if fullname in _dualmodules:
>> +                stem = fullname.split('.')[-1]
>> +                fullname = 'mercurial.pure.%s' % stem
>> +                target = pure
>> +                assert len(path) == 1
>> +                path = [os.path.join(path[0], 'pure')]
>> +
>> +            # Try to find the module using other registered finders.
>> +            spec = None
>> +            for finder in sys.meta_path:
>> +                if finder == self:
>> +                    continue
>> +
>> +                spec = finder.find_spec(fullname, path, target=target)
>> +                if spec:
>> +                    break
>> +
>> +            if not spec:
>> +                return None
>> +
>> +            if fullname.startswith('mercurial.pure.'):
>> +                spec.name = spec.name.replace('.pure.', '.')
>> +
>> +            # TODO need to support loaders from alternate specs, like zip
>> +            # loaders.
>> +            spec.loader = hgloader(spec.name, spec.origin)
>> +            return spec
>> +
>> +    def replacetoken(t):
>> +        if t.type == token.STRING:
>> +            s = t.string
>> +
>> +            # If a docstring, keep it as a string literal.
>> +            if s[0:3] in ("'''", '"""'):
>> +                return t
>> +
>> +            if s[0] not in ("'", '"'):
>> +                return t
>> +
>> +            # String literal. Prefix to make a b'' string.
>> +            return tokenize.TokenInfo(t.type, 'b%s' % s, t.start, t.end,
>> t.line)
>> +
>> +        return t
>> +
>> +    class hgloader(importlib.machinery.SourceFileLoader):
>> +        """Custom module loader that transforms source code.
>> +
>> +        When the source code is converted to code, we first transform
>> +        string literals to byte literals using the tokenize API.
>> +        """
>> +        def source_to_code(self, data, path):
>> +            buf = io.BytesIO(data)
>> +            tokens = tokenize.tokenize(buf.readline)
>> +            data = tokenize.untokenize(replacetoken(t) for t in tokens)
>> +            return super(hgloader, self).source_to_code(data, path)
>> +
>>  # We automagically register our custom importer as a side-effect of
>> loading.
>>  # This is necessary to ensure that any entry points are able to import
>>  # mercurial.* modules without having to perform this registration
>> themselves.
>> -if not any(isinstance(x, hgimporter) for x in sys.meta_path):
>> -    # meta_path is used before any implicit finders and before sys.path.
>> -    sys.meta_path.insert(0, hgimporter())
>> +if sys.version_info[0] >= 3:
>> +    sys.meta_path.insert(0, hgpathentryfinder())
>> +else:
>> +    if not any(isinstance(x, hgimporter) for x in sys.meta_path):
>> +        # meta_path is used before any implicit finders and before
>> sys.path.
>> +        sys.meta_path.insert(0, hgimporter())
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>


More information about the Mercurial-devel mailing list