Using Rust in Mercurial

This page describes the plan and status for leveraging the Rust programming language in Mercurial.

Desired End State

Current Status

(last updated November 2017)

Priorities for Oxidation

All existing C code is a priority for oxidation because we don't like maintaining C code for safety and compatibility reasons. Existing C code includes:

In addition, the following would be good candidates for oxidation:


CRT Mismatch on Windows

Mercurial still uses Python 2.7. Python 2.7 is officially compiled with MSVC 2008 and links against vcruntime90.dll. Rust and its standard library don't support MSVC 2008. They are likely linked with something newer, like MSVC 2015 or 2017.

If we want compatibility with other binary Python extensions, we need to use a Python built with MSVC 2008 and linked against msvcr90.dll.

So, our options are:

  1. Build a custom Python 2.7 distribution with modern MSVC and drop support for 3rd party binary Python 2.7 extensions.
  2. Switch Mercurial to Python 3 and build Rust code with same toolchain as Python we target.
  3. Mix the CRTs.

#1 significantly undermines Mercurial's extensibility. Plus, Python 2.7 built for !MSVC 2008 isn't officially supported.

#2 is in progress. However, the timeline for officially supporting Python 3 to the point where we can transition the official distribution for it is likely too far out (2H 2018) and would hinder Rust adoption efforts.

That leaves mixing the CRTs. This would work by having the Rust components statically link a modern CRT while having Python dynamically load msvcr90.dll.

Mixing CRTs is dangerous because if you attempt to perform a multipart operation with multiple CRTs, things could blow up. e.g. if you malloc() in CRT A and free() in CRT B. Or attempt to operate on FILE instances across CRTs. More info at See also for a full list of CRT functions.

Fortunately, our exposure to the multiple CRT problem is significantly reduced because:

We would have to keep a close eye out for CRT objects spanning multiple CRTs. We can mitigate exposure for bad patterns by establishing static analysis rules on source code. We can also examine the produced Rust binaries for symbol references and raise warnings when unwanted CRT functions are used by Rust code.

Rust Support

Mercurial relies on other entities (like Linux distros) to package and distribute Mercurial. This means we have to consider their support for packaging programs that use Rust or else we risk losing packagers. This means we need to consider:

* The minimum version of Rust to require * Whether we can use beta or nightly Rust features

For official Mercurial distributions, these considerations don't exist, as we'll be giving a binary to end-users. So this topic is all about our relationship with downstream packagers.

Packaging Overhaul Needed

If hg becomes a Rust binary and we want Mercurial to be a self-contained application, we'll need to overhaul our packaging mechanisms on all operating systems.

Distributing Python

Mercurial would need to distribute a copy of Python.

Python insists that embedded Python load a pythonXX shared library. e.g. python27.dll or

We would also need to distribute a copy of the Python standard library (.py, .pyc, etc files). These could be distributed in flat form (hundreds of .py files) or in a zip file. (Python supports importing modules from zip files.) If we wanted to get creative, we could invent our own archive format / module loading mechanism (but this feels like unnecessary work).

We can't prune the Python standard library of unused modules because Mercurial extensions may make use of any feature in the standard library. So we'll be distributing the entire Python standard library.

But the distribution of Python is not required: various packagers (like operating systems) would want Mercurial to use a Python provided to it. So our Rust hg needs to support loading a bundled Python and a Python provided to it. This can likely be controlled with build-time flags.


Mercurial could conceptually be distributed as a .zip file. That archive would contain pre-built hg.exe, pythonXX.dll, any other shared library dependencies, a copy of the Python standard library, Mercurial Python files, and any support files.

Because zip files aren't user friendly, we'd likely provide a standalone .exe or .msi installer (like we do today).


We could provide a self-contained archive file containing hg binary,, and any other dependencies. We could also provide rpm, deb, etc packages for popular distributions. These would be self-contained and not dependent any many (any?) other packages. Our biggest concern here is libc compatibility. That can be solved by static linking, compiling against a sufficiently old (and compatible) libc, or providing distro-specific packages.

Of course, many distros will want to provide their own Mercurial package. And they will likely want Mercurial to make use of the system Python. We can and must support this.

An issue with a self-contained distribution is loading of shared libraries. Not all operating systems and loaders may support loading of binary-relative shared libraries. We may need to hack something together that uses dlopen() to explicitly specify which, etc to load.


This is very similar to Linux. We may support the native application / installer mechanism to make things more user friendly. We don't have good support for this today. So it is likely most users will rely on Homebrew or MacPorts for installation.

BSDs / Solaris / Etc

Basically the same strategy as Linux.

PyPI / pip

We support installing Mercurial via pip today. We upload a source distribution to PyPI and anyone can pip install Mercurial to install Mercurial in their Python environment. On Windows (where users can't easily compile binary Python extensions), we provide Python wheels with pre-built Mercurial binaries.

The future of pip install Mercurial with an oxidized Mercurial is less clear.

pip is tailored towards Python applications. If Mercurial is a Rust application and Python is an implementation detail, does it make sense to use pip and PyPI as a distribution channel?

pip install Mercurial is very convenient (at least for the people that have pip installed and can run it). It is certainly easier than downloading and running an installer. So unless we bake an upgrade facility into Mercurial itself, pip install Mercurial is the next best thing for upgrading after the system package manager (apt, yum, brew, port, etc).

pip install Mercurial goes through a well-defined mechanism to take the artifact it downloaded from PyPI to install it. This mechanism could be abused to facilitate the use of PyPI/pip for distributing a self-contained Mercurial distribution. e.g. the user would end up with a Rust binary in PYTHONHOME/bin/hg that loads a custom version of Python and is fully self-contained and isolated from the Python it was pip installed into. This would be super hacky. It may not even be allowed by PyPI's hosting terms of service? But we could certainly abuse pip install if we needed to.

Support for PyPy / non-CPython Pythons

There exist Python distributions beyond the official CPython distribution. PyPy likely being the one of the most interest to us because of its performance advantages.

The cost to supporting non-CPython Pythons when hg is a Rust binary could be very high. That would likely significantly curtail the use of the CPython API. Instead, we'd have to do interop via ctypes or cffi or provide N ways to do interop.

It's worth noting that if Mercurial is a self-contained application, we could potentially swap out CPython for PyPy. We could go as far as to unsupport CPython completely.

Rust <=> Python Interop

Rust and Python code will need to call into each other. (Although it is anticipated that the bulk of the calling will be from Python into Rust code - at least initially.)

There are many options for us here.

python27-sys and python3-sys are low-level Rust bindings to the CPython API. Lots of unsafe {} code here.

rust-cpython and PyO3 are higher-level bindings to python27-sys and python3-sys. They are what you want to use for day-to-day Rust programming.

PyO3 is a fork of rust-cpython. It seems to be a bit nicer. But it requires Nightly Rust features.

Milksnake uses Rust's cbindgen crate to automatically generate Python cffi bindings to Rust libraries. Essentially, you write a Rust library that exports symbols and milksnake can generate a Python binding to it. There's a lot going on. But it is definitely an interesting approach. And some of the components are useful without the rest of milksnake. e.g. the idea of using cbindgen + cffi to generate low-level Python bindings. Because Milksnake uses cffi, the approach should work with both CPython and PyPy.

A major reason for adopting Rust (and C before that) is performance. We know from Mercurial's C extensions that native code is often vasly undermined by a) crossing the Python<->native boundary b) excessive use of Python API from native code. For example, obsolescence marker parsing is ~100x faster in C. However, once you construct PyObject for all the parsed markers, it is only 2-4x faster.

We know that using ctypes to call from Python into native code is significantly slower than binary Python extensions. Although if the number of function calls and data being transferred across the boundary is small, this difference isn't as pronounced. Rust will enable us to write more functionality in native code (we try to avoid writing C today for maintainability and security reasons). So the performance of the Python<->native bridge will be more important over time. Therefore, it seems prudent to rule out ctypes. That leaves us with extensions or CFFI.

Reconciling `hg` with Rust extensions

Initially, hg will be a minimal Rust binary that embeds a Python interpreter. It simply tells the interpreter to invoke Mercurial's main() function. In this world, other Rust functionality is likely loaded via shared libraries or Python extensions. In other words, we have multiple Rust contexts running from different binaries (an executable and a shared library). The executable handles very early process activity. The shared library handles business logic.

Over time, we'll likely want to expand the role of Rust for early process activity. For example, we'll need to implement some command line processing in Rust for chg functionality. We may also want to implement config file loading (we need to rewrite the config parser anyway to facilitate writing back config changes). And, if we could load a repo from disk and maybe even implement performance critical commands (like hg status) from pure Rust, this would likely be a massive performance win. (Although we have to consider how this will interact with extensibility.)

What this means is that we'll have multiple Rust binaries holding Mercurial state. This feels brittle. Ideally we'd have a single Rust binary. If Python needed to call into native/Rust code, it would get those symbols from the parent hg binary instead of from a shared library. It is unclear how this would work. It is obviously possible to resolve the address of a symbol in the current binary. But existing "call native code" mechanisms in Python seem to assume that symbols are coming from loaded libraries, not the current executable. This may require modifications to cffi or some custom code to generate the Python bindings to executable-local symbols.

Preserving Support for Extensions

Mercurial implemented in Python is good for extensibility because it means extensions can customize nearly every part of Mercurial - often via monkeypatching.

As we use more Rust, we no longer have the dynamic nature of Python and extensions will lose some of their power.

As more of hg is implemented in Rust before any Python is called, we could lose the ability for Python extensions to influence low-level and core operations. e.g. if we want to implement hg status such that it doesn't invoke Python and incur Python startup overhead, how do we enable extensions to still influence the behavior of hg status?

Presumably, hg will eventually implement config file loading and command line processing. So, Rust will be able to see which extensions are being loaded. Assuming hg can resolve the paths to loaded extensions, we could add a syntax to the main extensions file to declare their influence on various behavior. For example, if an extension influences behavior of hg status, its source code could contain something like: # hgext-influences: cmd-status. Rust would see this special syntax and know it needs to instantiate a Python interpreter in order to load the extension for the current hg status command. We could also imagine doing something similar for other functionality implemented in Rust, such as the core store interface.

CategoryNewFeatures CategoryNewFeatures