Rust extensions: the next step

Georges Racinet gracinet at anybox.fr
Thu Oct 18 13:15:06 EDT 2018


On 10/18/2018 04:09 PM, Yuya Nishihara wrote:
> On Thu, 18 Oct 2018 08:58:04 -0400, Josef 'Jeff' Sipek wrote:
>> On Thu, Oct 18, 2018 at 12:22:16 +0200, Gregory Szorc wrote:
>> ...
>>> Something else we may want to consider is a single Python module exposing
>>> the Rust code instead of N. Rust’s more aggressive cross function
>>> compilation optimization could result in better performance if everything
>>> is linked/exposed in a single shared library/module/extension. Maybe this
>>> is what you are proposing? It is unclear if Rust code is linked into the
>>> Python extension or loaded from a shared shared library.
>> (Warning: I suck at python, aren't an expert on rust, but have more
>> knowledge about ELF linking/loading/etc. than is healthy.)
>>
>> Isn't there also a distinction between code layout (separate crates) and the
>> actual binary that cargo/rustc builds?  IOW, the code could (and probably
>> should) be nicely separated but rustc can combine all the crates' code into
>> one big binary for loading into python.  Since it would see all the code, it
>> can do its fancy optimizations without impacting code readability.
> IIUC, it is. Perhaps, the rustext is a single binary exporting multiple
> submodules?
Yes totally, it's exactly as Josef writes. To demonstrate, here's what I
have :

$ ls mercurial/*.so
mercurial/rustext.so  mercurial/zstd.so
$ python
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from mercurial import rustext
>>> dir(rustext)
['GraphError', '__doc__', '__file__', '__name__', '__package__',
'ancestors']    
>>> from mercurial.rustext import ancestors
>>> ancestors is rustext.ancestors
True
>>> dir(ancestors)
['AncestorsIterator', '__doc__', '__name__', '__package__']

So, in short, it's a single shared library that can hold a bunch of
modules. The submodules are themselves initialized from the Rust code.
Here's the definition of 'rustext' itself. It follows the pattern
expected by Josef.

$ tail rust/hg-cpython/src/lib.rs
mod ancestors;  // corresponds to src/ancestors.rs
mod exceptions;

py_module_initializer!(rustext, initrustext, PyInit_rustext, |py, m| {
    m.add(py, "__doc__", "Mercurial core concepts - Rust implementation")?;

    m.add(py, "ancestors", ancestors::init_module(py)?)?;
    m.add(py, "GraphError", py.get_type::<exceptions::GraphError>())?;
    Ok(())
});

(Mark confirmed to me during the sprint that adding submodules on the
fly was doable).

Indeed I hope the Rust compiler can do lots of optimizations in that
single shared library object.
>
> I expect "rustext" (or its upper layer) to be a shim over Rust-based modules
> and cexts. So if you do policy.importmod('parsers'), it will return
> cext.parsers, whereas policy.importmod('ancestor') will return rustext.ancestor,
> though I have no idea if there will be cext/pure.ancestor.
Yes, it's quite possible to add a new module policy this way. After all,
from mercurial.policy, it behaves in the same way as the cext package
does and the fact that we have a single shared library instead of
several ones is an implementation detail, hidden by Python's import
machinery.

But this opens another, longer term, question: currently what I have in
mercurial.rustext.ancestor has only a fragment of what
mercurial.ancestor provides. Therefore to have mercurial.policy handle
it, we'll need either to take such partial cases into account, or decide
to translate the whole Python module in Rust. For the time being, I'm
simply doing an import and catch the error to fallback to the Python
version.

Regards,

-- 
Georges Racinet
Anybox SAS, http://anybox.fr
Téléphone: +33 6 51 32 07 27
GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics




More information about the Mercurial-devel mailing list