[PATCH 6 of 6 import-refactor V3] mercurial: support loading modules from zipimporter

Brett Cannon brett at python.org
Fri Dec 4 12:27:40 CST 2015


On Fri, 4 Dec 2015 at 07:13 Augie Fackler <raf at durin42.com> wrote:

> (+brett, who might have some idea about import behavior and is
> probably going to be interested in this change anyway.)
>
> On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1449206705 28800
> > #      Thu Dec 03 21:25:05 2015 -0800
> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
> > mercurial: support loading modules from zipimporter
> >
> > The previous refactor to module importing broke module loading when
> > mercurial.* modules were loaded from a zipfile (using a zipimporter).
> > This scenario is likely encountered when using py2exe.
> >
> > Supporting zipimporter and the traditional importer side-by-side
> > turns out to be quite a pain. In Python 2.x, the standard, file-based
> > import mechanism is partially implemented in C. The sys.meta_path
> > and sys.path_hooks hook points exist to allow custom importers in
> > Python/userland. zipimport.zipimporter and our "hgimporter" class
> > from earlier in this patch series are 2 of these.
> >
> > In a standard Python installation (no matter if running in py2exe
> > or similar or not), zipimport.zipimporter appears to be registered
> > in sys.path_hooks. This means that as each sys.path entry is
> > consulted, it will ask zipimporter if it supports that path and
> > zipimporter will be used if that entry is a zip file. In a
> > py2exe environment, sys.path contains an entry with the path to
> > the zip file containing the Python standard library along with
> > Mercurial's Python files.
> >
> > The way the importer mechanism works is the first importer that
> > declares knowledge of a module (via find_module() returning an
> > object) gets to load it. Since our "hgimporter" is registered
> > in sys.meta_path and returns an interest in specific mercurial.*
> > modules, the zipimporter registered on sys.path_hooks never comes
> > into play for these modules. So, we need to be zipimporter aware
> > and call into zipimporter to load modules.
> >
> > This patch teaches "hgimporter" how to call out into zipimporter
> > when necessary. We detect the necessity of zipimporter by looking
> > at the loader for the "mercurial" module. If it is a zipimporter
> > instance, we load via zipimporter.
> >
> > The behavior of zipimporter is a bit wonky.
>
> Patch is already queued, but Brett, do you have any insight into why
> zipimport works this way? (see below)
>
> >
> > You appear to need separate zipimporter instances for each directory
> > in the zip file. I'm not sure why this is.


sys.path_importer_cache is the most likely culprit for structuring it this
way, but otherwise it's just the way someone chose to implement it (the
other approach would have been to cache a single importer per zipfile in
the sys.path_hooks object and then have the finders and loaders be
basically a single instance per zipfile).

FYI we are going to rewrite zipimporter in 3.6 or 3.7.

I suspect it has
> > something to do with the low-level importing mechanism (implemented
> > in C) operating on a per-directory basis. PEP-302 makes some
> > references to this. I was not able to get a zipimporter to
> > import modules outside of its immediate directory no matter how
> > I specified the module name. This is why we use separate
> > zipimporter instances for the ".zip/mercurial" and
> > ".zip/mercurial/pure" locations.
> >
> > The zipimporter documentation for Python 2.7 explicitly states that
> > zipimporter does not import dynamic modules (C extensions). Yet from
> > a py2exe distribution on Windows - where the .pyd files are *not*
> > in the zip archive - zipimporter imported these dynamic modules
> > just fine!


That doesn't work on UNIX which is why the docs say extension modules are
not supported.


> I'm not sure if dynamic modules can't be imported from
> > *inside* the zip archive or whether zipimporter looks for dynamic
> > modules outside the zip archive. All I know is zipimporter does
> > manage to import the .pyd files on Windows and this patch makes
> > our new importer compatible with py2exe.
> >
> > In the ideal world, We'd probably reimplement or fall back to parts
> > of the built-in import mechanism instead of handling zipimporter
> > specially. After all, if we're loading Mercurial modules via
> > something that isn't the built-in file-based importer or zipimporter,
> > our custom importer will likely fail because it doesn't know how to
> > call into it. I'd like to think that we'll never encounter this
> > in the wild, but you never know. If we do encounter it, we can
> > come up with another solution.
>

Sticking yourselves on sys.meta_path does complicate things since
sys.meta_path is explicitly for handling alternative storage solutions for
modules, i.e., built-in modules, frozen modules, and file-based storage.
What it sounds like you're trying to do is wrap the normal file-based
import stuff which is tough in Python 2 since that's all implicitly done.


> >
> > It's worth nothing that Python 3 has moved a lot of the importing
> > code from C to Python. Python 3 gives you near total control over
> > the import mechanism. So in the very distant future when Mercurial
> > drops Python 2 support, it's likely that our custom importer code
> > can be refactored to something a bit saner.
>

Not sure what your custom importer does, but if it exists for lazy loading
then I already implemented it for you in Python 3.5:
https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader

-Brett


> >
> > diff --git a/mercurial/__init__.py b/mercurial/__init__.py
> > --- a/mercurial/__init__.py
> > +++ b/mercurial/__init__.py
> > @@ -5,16 +5,17 @@
> >  # This software may be used and distributed according to the terms of
> the
> >  # GNU General Public License version 2 or any later version.
> >
> >  from __future__ import absolute_import
> >
> >  import imp
> >  import os
> >  import sys
> > +import zipimport
> >
> >  __all__ = []
> >
> >  # Rules for how modules can be loaded. Values are:
> >  #
> >  #    c - require C extensions
> >  #    allow - allow pure Python implementation when C loading fails
> >  #    py - only load pure Python modules
> > @@ -55,16 +56,46 @@ class hgimporter(object):
> >
> >      def load_module(self, name):
> >          mod = sys.modules.get(name, None)
> >          if mod:
> >              return mod
> >
> >          mercurial = sys.modules['mercurial']
> >
> > +        # The zip importer behaves sufficiently differently from the
> default
> > +        # importer to warrant its own code path.
> > +        loader = getattr(mercurial, '__loader__', None)
> > +        if isinstance(loader, zipimport.zipimporter):
> > +            def ziploader(*paths):
> > +                """Obtain a zipimporter for a directory under the main
> zip."""
> > +                path = os.path.join(loader.archive, *paths)
> > +                zl = sys.path_importer_cache.get(path)
> > +                if not zl:
> > +                    zl = zipimport.zipimporter(path)
> > +                return zl
> > +
> > +            try:
> > +                if modulepolicy == 'py':
> > +                    raise ImportError()
> > +
> > +                zl = ziploader('mercurial')
> > +                mod = zl.load_module(name)
> > +                # Unlike imp, ziploader doesn't expose module metadata
> that
> > +                # indicates the type of module. So just assume what we
> found
> > +                # is OK (even though it could be a pure Python module).
> > +            except ImportError:
> > +                if modulepolicy == 'c':
> > +                    raise
> > +                zl = ziploader('mercurial', 'pure')
> > +                mod = zl.load_module(name)
> > +
> > +            sys.modules[name] = mod
> > +            return mod
> > +
> >          # Unlike the default importer which searches special locations
> and
> >          # sys.path, we only look in the directory where "mercurial" was
> >          # imported from.
> >
> >          # imp.find_module doesn't support submodules (modules with ".").
> >          # Instead you have to pass the parent package's __path__
> attribute
> >          # as the path argument.
> >          stem = name.split('.')[-1]
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151204/89470391/attachment.html>


More information about the Mercurial-devel mailing list