[PATCH 6 of 6 import-refactor V3] mercurial: support loading modules from zipimporter

Gregory Szorc gregory.szorc at gmail.com
Fri Dec 4 12:50:19 CST 2015


On Fri, Dec 4, 2015 at 10:27 AM, Brett Cannon <brett at python.org> wrote:

>
>
> On Fri, 4 Dec 2015 at 07:13 Augie Fackler <raf at durin42.com> wrote:
>
>> (+brett, who might have some idea about import behavior and is
>> probably going to be interested in this change anyway.)
>>
>> On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
>> > # HG changeset patch
>> > # User Gregory Szorc <gregory.szorc at gmail.com>
>> > # Date 1449206705 28800
>> > #      Thu Dec 03 21:25:05 2015 -0800
>> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
>> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
>> > mercurial: support loading modules from zipimporter
>> >
>> > The previous refactor to module importing broke module loading when
>> > mercurial.* modules were loaded from a zipfile (using a zipimporter).
>> > This scenario is likely encountered when using py2exe.
>> >
>> > Supporting zipimporter and the traditional importer side-by-side
>> > turns out to be quite a pain. In Python 2.x, the standard, file-based
>> > import mechanism is partially implemented in C. The sys.meta_path
>> > and sys.path_hooks hook points exist to allow custom importers in
>> > Python/userland. zipimport.zipimporter and our "hgimporter" class
>> > from earlier in this patch series are 2 of these.
>> >
>> > In a standard Python installation (no matter if running in py2exe
>> > or similar or not), zipimport.zipimporter appears to be registered
>> > in sys.path_hooks. This means that as each sys.path entry is
>> > consulted, it will ask zipimporter if it supports that path and
>> > zipimporter will be used if that entry is a zip file. In a
>> > py2exe environment, sys.path contains an entry with the path to
>> > the zip file containing the Python standard library along with
>> > Mercurial's Python files.
>> >
>> > The way the importer mechanism works is the first importer that
>> > declares knowledge of a module (via find_module() returning an
>> > object) gets to load it. Since our "hgimporter" is registered
>> > in sys.meta_path and returns an interest in specific mercurial.*
>> > modules, the zipimporter registered on sys.path_hooks never comes
>> > into play for these modules. So, we need to be zipimporter aware
>> > and call into zipimporter to load modules.
>> >
>> > This patch teaches "hgimporter" how to call out into zipimporter
>> > when necessary. We detect the necessity of zipimporter by looking
>> > at the loader for the "mercurial" module. If it is a zipimporter
>> > instance, we load via zipimporter.
>> >
>> > The behavior of zipimporter is a bit wonky.
>>
>> Patch is already queued, but Brett, do you have any insight into why
>> zipimport works this way? (see below)
>>
>> >
>> > You appear to need separate zipimporter instances for each directory
>> > in the zip file. I'm not sure why this is.
>
>
> sys.path_importer_cache is the most likely culprit for structuring it this
> way, but otherwise it's just the way someone chose to implement it (the
> other approach would have been to cache a single importer per zipfile in
> the sys.path_hooks object and then have the finders and loaders be
> basically a single instance per zipfile).
>
> FYI we are going to rewrite zipimporter in 3.6 or 3.7.
>
> I suspect it has
>> > something to do with the low-level importing mechanism (implemented
>> > in C) operating on a per-directory basis. PEP-302 makes some
>> > references to this. I was not able to get a zipimporter to
>> > import modules outside of its immediate directory no matter how
>> > I specified the module name. This is why we use separate
>> > zipimporter instances for the ".zip/mercurial" and
>> > ".zip/mercurial/pure" locations.
>> >
>> > The zipimporter documentation for Python 2.7 explicitly states that
>> > zipimporter does not import dynamic modules (C extensions). Yet from
>> > a py2exe distribution on Windows - where the .pyd files are *not*
>> > in the zip archive - zipimporter imported these dynamic modules
>> > just fine!
>
>
> That doesn't work on UNIX which is why the docs say extension modules are
> not supported.
>
>
>> I'm not sure if dynamic modules can't be imported from
>> > *inside* the zip archive or whether zipimporter looks for dynamic
>> > modules outside the zip archive. All I know is zipimporter does
>> > manage to import the .pyd files on Windows and this patch makes
>> > our new importer compatible with py2exe.
>> >
>> > In the ideal world, We'd probably reimplement or fall back to parts
>> > of the built-in import mechanism instead of handling zipimporter
>> > specially. After all, if we're loading Mercurial modules via
>> > something that isn't the built-in file-based importer or zipimporter,
>> > our custom importer will likely fail because it doesn't know how to
>> > call into it. I'd like to think that we'll never encounter this
>> > in the wild, but you never know. If we do encounter it, we can
>> > come up with another solution.
>>
>
> Sticking yourselves on sys.meta_path does complicate things since
> sys.meta_path is explicitly for handling alternative storage solutions for
> modules, i.e., built-in modules, frozen modules, and file-based storage.
> What it sounds like you're trying to do is wrap the normal file-based
> import stuff which is tough in Python 2 since that's all implicitly done.
>
>
>> >
>> > It's worth nothing that Python 3 has moved a lot of the importing
>> > code from C to Python. Python 3 gives you near total control over
>> > the import mechanism. So in the very distant future when Mercurial
>> > drops Python 2 support, it's likely that our custom importer code
>> > can be refactored to something a bit saner.
>>
>
> Not sure what your custom importer does, but if it exists for lazy loading
> then I already implemented it for you in Python 3.5:
> https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader
>

Mercurial has C and Python implementations for a handful of modules. We
desire to run both forms from the same installation so e.g. CPython and
PyPy can share the same installation.

The custom importer (
http://hg.netv6.net/clowncopter/file/30a20167ae29/mercurial/__init__.py)
basically looks in multiple locations for these modules. Depending on the
environment, it will try to load modules from mercurial/* and
mercurial/pure/*.

We could probably move files around and use a custom function to obtain a
reference to the module (instead of "import" throughout all the code that
imports these modules). But this would have required rewriting a bunch of
code. I felt it was easier to employ importer hackery than to rewrite the
world.

I'm receptive to better ideas.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151204/eb948232/attachment.html>


More information about the Mercurial-devel mailing list