Using Rust in Mercurial
This page describes the plan and status for leveraging the Rust programming language in Mercurial.
Desired End State
hg is a Rust binary that embeds and uses a Python interpreter when appropriate (hg is a Python script today)
- Python code seemlessly calls out to functionality implemented in Rust
- Fully self-contained Mercurial distributions are available (Python is an implementation detail / Mercurial sufficiently independent from other Python presence on system)
- The customizability and extensibility of Mercurial through extensions is not significantly weakened.
chg functionality is rolled into hg
CRT Mismatch on Windows
Mercurial still uses Python 2.7. Python 2.7 is officially compiled with MSVC 2008 and links against vcruntime90.dll. Rust and its standard library don't support MSVC 2008. They are likely linked with something newer, like MSVC 2015 or 2017.
If we want compatibility with other binary Python extensions, we need to use a Python built with MSVC 2008 and linked against msvcr90.dll.
So, our options are:
- Build a custom Python 2.7 distribution with modern MSVC and drop support for 3rd party binary Python 2.7 extensions.
- Switch Mercurial to Python 3 and build Rust code with same toolchain as Python we target.
- Mix the CRTs.
#1 significantly undermines Mercurial's extensibility. Plus, Python 2.7 built for !MSVC 2008 isn't officially supported.
#2 is in progress. However, the timeline for officially supporting Python 3 to the point where we can transition the official distribution for it is likely too far out (2H 2018) and would hinder Rust adoption efforts.
That leaves mixing the CRTs. This would work by having the Rust components statically link a modern CRT while having Python dynamically load msvcr90.dll.
Mixing CRTs is dangerous because if you attempt to perform a multipart operation with multiple CRTs, things could blow up. e.g. if you malloc() in CRT A and free() in CRT B. Or attempt to operate on FILE instances across CRTs. More info at https://docs.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries.
Fortunately, our exposure to the multiple CRT problem is significantly reduced because:
- Rust and its standard library doesn't make heavy use of CRT primitives.
Memory managed by Rust and Python is already being kept separate by the Python API. In Rust speak, we won't be transferring ownership of raw pointers between Rust and Python. Python's refcounting mechanism ensures all PyObject are destroyed by Python. The only time ownership of memory crosses the bridge is when we create something in Rust and pass it to Python. But that object will be a PyObject and backing memory would have been managed with the Python APIs.
We shouldn't be using FILE anywhere. And I/O on an open file descriptor would likely be limited to its created context. e.g. if we open a file from Rust, we're likely not reading it from Python.
We would have to keep a close eye out for CRT objects spanning multiple CRTs. We can mitigate exposure for bad patterns by establishing static analysis rules on source code. We can also examine the produced Rust binaries for symbol references and raise warnings when unwanted CRT functions are used by Rust code.
Mercurial relies on other entities (like Linux distros) to package and distribute Mercurial. This means we have to consider their support for packaging programs that use Rust or else we risk losing packagers. This means we need to consider:
* The minimum version of Rust to require * Whether we can use beta or nightly Rust features
For official Mercurial distributions, these considerations don't exist, as we'll be giving a binary to end-users. So this topic is all about our relationship with downstream packagers.
Packaging Overhaul Needed
If hg becomes a Rust binary and we want Mercurial to be a self-contained application, we'll need to overhaul our packaging mechanisms on all operating systems.
Mercurial would need to distribute a copy of Python.
Python insists that embedded Python load a pythonXX shared library. e.g. python27.dll or libpython27.so.
We would also need to distribute a copy of the Python standard library (.py, .pyc, etc files). These could be distributed in flat form (hundreds of .py files) or in a zip file. (Python supports importing modules from zip files.) If we wanted to get creative, we could invent our own archive format / module loading mechanism (but this feels like unnecessary work).
We can't prune the Python standard library of unused modules because Mercurial extensions may make use of any feature in the standard library. So we'll be distributing the entire Python standard library.
But the distribution of Python is not required: various packagers (like operating systems) would want Mercurial to use a Python provided to it. So our Rust hg needs to support loading a bundled Python and a Python provided to it. This can likely be controlled with build-time flags.
Mercurial could conceptually be distributed as a .zip file. That archive would contain pre-built hg.exe, pythonXX.dll, any other shared library dependencies, a copy of the Python standard library, Mercurial Python files, and any support files.
Because zip files aren't user friendly, we'd likely provide a standalone .exe or .msi installer (like we do today).
We could provide a self-contained archive file containing hg binary, libpython27.so, and any other dependencies. We could also provide rpm, deb, etc packages for popular distributions. These would be self-contained and not dependent any many (any?) other packages. Our biggest concern here is libc compatibility. That can be solved by static linking, compiling against a sufficiently old (and compatible) libc, or providing distro-specific packages.
Of course, many distros will want to provide their own Mercurial package. And they will likely want Mercurial to make use of the system Python. We can and must support this.
An issue with a self-contained distribution is loading of shared libraries. Not all operating systems and loaders may support loading of binary-relative shared libraries. We may need to hack something together that uses dlopen() to explicitly specify which libpython27.so, etc to load.
This is very similar to Linux. We may support the native application / installer mechanism to make things more user friendly. We don't have good support for this today. So it is likely most users will rely on Homebrew or MacPorts for installation.
BSDs / Solaris / Etc
Basically the same strategy as Linux.
PyPI / pip
We support installing Mercurial via pip today. We upload a source distribution to PyPI and anyone can pip install Mercurial to install Mercurial in their Python environment. On Windows (where users can't easily compile binary Python extensions), we provide Python wheels with pre-built Mercurial binaries.
The future of pip install Mercurial with an oxidized Mercurial is less clear.
pip is tailored towards Python applications. If Mercurial is a Rust application and Python is an implementation detail, does it make sense to use pip and PyPI as a distribution channel?
pip install Mercurial is very convenient (at least for the people that have pip installed and can run it). It is certainly easier than downloading and running an installer. So unless we bake an upgrade facility into Mercurial itself, pip install Mercurial is the next best thing for upgrading after the system package manager (apt, yum, brew, port, etc).
pip install Mercurial goes through a well-defined mechanism to take the artifact it downloaded from PyPI to install it. This mechanism could be abused to facilitate the use of PyPI/pip for distributing a self-contained Mercurial distribution. e.g. the user would end up with a Rust binary in PYTHONHOME/bin/hg that loads a custom version of Python and is fully self-contained and isolated from the Python it was pip installed into. This would be super hacky. It may not even be allowed by PyPI's hosting terms of service? But we could certainly abuse pip install if we needed to.