D6631: rust-cpython: add macro for sharing references
Alphare (Raphaël Gomès)
phabricator at mercurial-scm.org
Wed Jul 10 11:48:55 UTC 2019
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.
REVISION SUMMARY
Following an experiment done by Georges Racinet, we now have a working way of
sharing references between Python and Rust. This is needed in many points of
the codebase, for example every time we need to expose an iterator to a
Rust-backed Python class.
In a few words, references are (unsafely) marked as `'static` and coupled
with manual reference counting; we are doing manual borrow-checking.
This changes introduces two declarative macro to help reduce boilerplate.
While it is better than not using macros, they are not perfect. They need to:
- Integrate with the garbage collector for container types (not needed
as of yet), as stated in the docstring
- Allow for leaking multiple attributes at the same time
- Inject the `leaked_count` data attribute in `py_class`-generated
structs
- Automatically namespace the functions and attributes they generate
For at least the last two points, we will need to write a procedural macro
instead of a declarative one.
While this reference-sharing mechanism is being ironed out I thought it best
not to implement it yet.
Lastly, and implementation detail renders our Rust-backed Python iterators too
strict to be proper drop-in replacements, as will be illustrated in a future
patch: if the data structure referenced by a non-depleted iterator is mutated,
an `AlreadyBorrowed` exception is raised, whereas Python would allow it, only
to raise a `RuntimeError` if `next` is called on said iterator. This will have
to be addressed at some point.
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D6631
AFFECTED FILES
rust/hg-cpython/src/dirstate/dirs_multiset.rs
rust/hg-cpython/src/dirstate/mod.rs
rust/hg-cpython/src/exceptions.rs
rust/hg-cpython/src/lib.rs
rust/hg-cpython/src/macros.rs
CHANGE DETAILS
diff --git a/rust/hg-cpython/src/macros.rs b/rust/hg-cpython/src/macros.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-cpython/src/macros.rs
@@ -0,0 +1,254 @@
+// macros.rs
+//
+// Copyright 2019 Raphaël Gomès <rgomes at octobus.net>
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Macros for use in the `hg-cpython` bridge library.
+
+/// Allows a `py_class!` generated struct to share references to one of its
+/// data members with Python.
+///
+/// # Warning
+///
+/// The targeted `py_class!` needs to have the
+/// `data leak_count: RefCell<usize>;` data attribute to compile.
+/// A better, more complicated macro is needed to automatically insert the
+/// leak count, which is not yet really battle tested (what happens when
+/// multiple references are needed?). See the example below.
+///
+/// TODO allow Python container types: for now, integration with the garbage
+/// collector does not extend to Rust structs holding references to Python
+/// objects. Should the need surface, `__traverse__` and `__clear__` will
+/// need to be written as per the `rust-cpython` docs on GC integration.
+///
+/// # Parameters
+///
+/// * `$name` is the same identifier used in for `py_class!` macro call.
+/// * `$inner_struct` is the identifier of the underlying Rust struct
+/// * `$data_member` is the identifier of the data member of `$inner_struct`
+/// that will be shared.
+/// * `$leaked` is the identifier to give to the struct that will manage
+/// references to `$name`, to be used for example in other macros like
+/// `py_shared_mapping_iterator`.
+///
+/// # Example
+///
+/// ```
+/// struct MyStruct {
+/// inner: Vec<u32>;
+/// }
+///
+/// py_class!(pub class MyType |py| {
+/// data inner: RefCell<MyStruct>;
+/// data leak_count: RefCell<usize>;
+/// });
+///
+/// py_shared_ref!(MyType, MyStruct, inner, MyTypeLeakedRef);
+/// ```
+macro_rules! py_shared_ref {
+ (
+ $name: ident,
+ $inner_struct: ident,
+ $data_member: ident,
+ $leaked: ident
+ ) => {
+ impl $name {
+ fn leak_immutable(&self, py: Python) -> &'static $inner_struct {
+ let ptr = self.$data_member(py).as_ptr();
+ *self.leak_count(py).borrow_mut() += 1;
+ unsafe { &*ptr }
+ }
+
+ fn borrow_mut<'a>(
+ &'a self,
+ py: Python<'a>,
+ ) -> PyResult<RefMut<$inner_struct>> {
+ match *self.leak_count(py).borrow() {
+ 0 => Ok(self.$data_member(py).borrow_mut()),
+ // TODO
+ // For now, this works differently than Python references
+ // in the case of iterators.
+ // Python does not complain when the data an iterator
+ // points to is modified if the iterator is never used
+ // afterwards.
+ // Here, we are stricter than this by refusing to give a
+ // mutable reference if it is already borrowed.
+ // While the additional safety might be argued for, it
+ // breaks valid programming patterns in Python and we need
+ // to fix this issue down the line.
+ _ => Err(AlreadyBorrowed::new(
+ py,
+ "Cannot borrow mutably while there are \
+ immutable references in Python objects",
+ )),
+ }
+ }
+
+ fn decrease_leak_count(&self, py: Python) {
+ *self.leak_count(py).borrow_mut() -= 1;
+ }
+ }
+
+ /// Manage immutable references to `$name` leaked into Python
+ /// iterators.
+ ///
+ /// In truth, this does not represent leaked references themselves;
+ /// it is instead useful alongside them to manage them.
+ pub struct $leaked {
+ inner: $name,
+ }
+
+ impl $leaked {
+ fn new(py: Python, inner: &$name) -> Self {
+ Self {
+ inner: inner.clone_ref(py),
+ }
+ }
+ }
+
+ impl Drop for $leaked {
+ fn drop(&mut self) {
+ let gil = Python::acquire_gil();
+ let py = gil.python();
+ self.inner.decrease_leak_count(py);
+ }
+ }
+ };
+}
+
+/// Defines a `py_class!` that acts as a Python iterator over a Rust iterator.
+macro_rules! py_shared_iterator_impl {
+ (
+ $name: ident,
+ $leaked: ident,
+ $iterator_type: ty,
+ $success_func: expr,
+ $success_type: ty
+ ) => {
+ py_class!(pub class $name |py| {
+ data inner: RefCell<Option<$leaked>>;
+ data it: RefCell<$iterator_type>;
+
+ def __next__(&self) -> PyResult<$success_type> {
+ let mut inner_opt = self.inner(py).borrow_mut();
+ if inner_opt.is_some() {
+ match self.it(py).borrow_mut().next() {
+ None => {
+ // replace Some(inner) by None, drop $leaked
+ inner_opt.take();
+ Ok(None)
+ }
+ Some(res) => {
+ $success_func(py, res)
+ }
+ }
+ } else {
+ Ok(None)
+ }
+ }
+
+ def __iter__(&self) -> PyResult<Self> {
+ Ok(self.clone_ref(py))
+ }
+ });
+
+ impl $name {
+ pub fn from_inner(
+ py: Python,
+ leaked: Option<$leaked>,
+ it: $iterator_type
+ ) -> PyResult<Self> {
+ Self::create_instance(
+ py,
+ RefCell::new(leaked),
+ RefCell::new(it)
+ )
+ }
+ }
+ };
+}
+
+/// Defines a `py_class!` that acts as a Python mapping iterator over a Rust
+/// iterator.
+///
+/// TODO: this is a bit awkward to use, and a better (more complicated)
+/// procedural macro would simplify the interface a lot.
+///
+/// # Parameters
+///
+/// * `$name` is the identifier to give to the resulting Rust struct.
+/// * `$leaked` corresponds to `$leaked` in the matching `py_shared_ref!` call.
+/// * `$iterator_type` is the iterator type
+/// (like `std::collections::hash_map::Iter`).
+/// * `$key_type` is the type of the key in the mapping
+/// * `$value_type` is the type of the value in the mapping
+/// * `$success_func` is a function for processing the Rust `(key, value)`
+/// tuple on iteration success, turning it into something Python understands.
+/// * `$success_func` is the return type of `$success_func`
+///
+/// # Example
+///
+/// ```
+/// struct MyStruct {
+/// inner: HashMap<Vec<u8>, Vec<u8>>;
+/// }
+///
+/// py_class!(pub class MyType |py| {
+/// data inner: RefCell<MyStruct>;
+/// data leak_count: RefCell<usize>;
+///
+/// def __iter__(&self) -> PyResult<MyTypeItemsIterator> {
+/// MyTypeItemsIterator::create_instance(
+/// py,
+/// RefCell::new(Some(MyTypeLeakedRef::new(py, &self))),
+/// RefCell::new(self.leak_immutable(py).iter()),
+/// )
+/// }
+/// });
+///
+/// impl MyType {
+/// fn translate_key_value(
+/// py: Python,
+/// res: (&Vec<u8>, &Vec<u8>),
+/// ) -> PyResult<Option<(PyBytes, PyBytes)>> {
+/// let (f, entry) = res;
+/// Ok(Some((
+/// PyBytes::new(py, f),
+/// PyBytes::new(py, entry),
+/// )))
+/// }
+/// }
+///
+/// py_shared_ref!(MyType, MyStruct, inner, MyTypeLeakedRef);
+///
+/// py_shared_mapping_iterator!(
+/// MyTypeItemsIterator,
+/// MyTypeLeakedRef,
+/// std::collections::hash_map::Iter,
+/// Vec<u8>,
+/// Vec<u8>,
+/// MyType::translate_key_value,
+/// Option<(PyBytes, PyBytes)>
+/// );
+/// ```
+macro_rules! py_shared_mapping_iterator {
+ (
+ $name:ident,
+ $leaked:ident,
+ $iterator_type: ident,
+ $key_type: ty,
+ $value_type: ty,
+ $success_func: path,
+ $success_type: ty
+ ) => {
+ py_shared_iterator_impl!(
+ $name,
+ $leaked,
+ $iterator_type<'static, $key_type, $value_type>,
+ $success_func,
+ $success_type
+ );
+ };
+}
\ No newline at end of file
diff --git a/rust/hg-cpython/src/lib.rs b/rust/hg-cpython/src/lib.rs
--- a/rust/hg-cpython/src/lib.rs
+++ b/rust/hg-cpython/src/lib.rs
@@ -27,6 +27,8 @@
pub mod ancestors;
mod cindex;
mod conversion;
+#[macro_use]
+pub mod macros;
pub mod dagops;
pub mod dirstate;
pub mod parsers;
diff --git a/rust/hg-cpython/src/exceptions.rs b/rust/hg-cpython/src/exceptions.rs
--- a/rust/hg-cpython/src/exceptions.rs
+++ b/rust/hg-cpython/src/exceptions.rs
@@ -65,3 +65,5 @@
}
}
}
+
+py_exception!(shared_ref, AlreadyBorrowed, RuntimeError);
diff --git a/rust/hg-cpython/src/dirstate/mod.rs b/rust/hg-cpython/src/dirstate/mod.rs
--- a/rust/hg-cpython/src/dirstate/mod.rs
+++ b/rust/hg-cpython/src/dirstate/mod.rs
@@ -29,6 +29,7 @@
mod dirs_multiset;
use dirstate::dirs_multiset::Dirs;
use std::convert::TryFrom;
+use exceptions::AlreadyBorrowed;
/// C code uses a custom `dirstate_tuple` type, checks in multiple instances
/// for this type, and raises a Python `Exception` if the check does not pass.
@@ -99,6 +100,7 @@
m.add(py, "__doc__", "Dirstate - Rust implementation")?;
m.add_class::<Dirs>(py)?;
+ m.add(py, "AlreadyBorrowed", py.get_type::<AlreadyBorrowed>())?;
let sys = PyModule::import(py, "sys")?;
let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?;
diff --git a/rust/hg-cpython/src/dirstate/dirs_multiset.rs b/rust/hg-cpython/src/dirstate/dirs_multiset.rs
--- a/rust/hg-cpython/src/dirstate/dirs_multiset.rs
+++ b/rust/hg-cpython/src/dirstate/dirs_multiset.rs
@@ -8,22 +8,25 @@
//! Bindings for the `hg::dirstate::dirs_multiset` file provided by the
//! `hg-core` package.
-use std::cell::RefCell;
+use std::cell::{RefCell, RefMut};
+use std::collections::hash_map::Iter;
+use std::convert::TryInto;
use cpython::{
- exc, ObjectProtocol, PyBytes, PyDict, PyErr, PyObject, PyResult,
- ToPyObject,
+ exc, ObjectProtocol, PyBytes, PyClone, PyDict, PyErr, PyObject, PyResult,
+ Python,
};
use dirstate::extract_dirstate;
+use exceptions::AlreadyBorrowed;
use hg::{
DirsIterable, DirsMultiset, DirstateMapError, DirstateParseError,
EntryState,
};
-use std::convert::TryInto;
py_class!(pub class Dirs |py| {
- data dirs_map: RefCell<DirsMultiset>;
+ data inner: RefCell<DirsMultiset>;
+ data leak_count: RefCell<usize>;
// `map` is either a `dict` or a flat iterator (usually a `set`, sometimes
// a `list`)
@@ -59,18 +62,18 @@
)
};
- Self::create_instance(py, RefCell::new(inner))
+ Self::create_instance(py, RefCell::new(inner), RefCell::new(0))
}
def addpath(&self, path: PyObject) -> PyResult<PyObject> {
- self.dirs_map(py).borrow_mut().add_path(
+ self.borrow_mut(py)?.add_path(
path.extract::<PyBytes>(py)?.data(py),
);
Ok(py.None())
}
def delpath(&self, path: PyObject) -> PyResult<PyObject> {
- self.dirs_map(py).borrow_mut().delete_path(
+ self.borrow_mut(py)?.delete_path(
path.extract::<PyBytes>(py)?.data(py),
)
.and(Ok(py.None()))
@@ -89,30 +92,43 @@
})
}
- // This is really inefficient on top of being ugly, but it's an easy way
- // of having it work to continue working on the rest of the module
- // hopefully bypassing Python entirely pretty soon.
- def __iter__(&self) -> PyResult<PyObject> {
- let dict = PyDict::new(py);
-
- for (key, value) in self.dirs_map(py).borrow().iter() {
- dict.set_item(
- py,
- PyBytes::new(py, &key[..]),
- value.to_py_object(py),
- )?;
- }
-
- let locals = PyDict::new(py);
- locals.set_item(py, "obj", dict)?;
-
- py.eval("iter(obj)", None, Some(&locals))
+ def __iter__(&self) -> PyResult<DirsMultisetKeysIterator> {
+ DirsMultisetKeysIterator::create_instance(
+ py,
+ RefCell::new(Some(DirsMultisetLeakedRef::new(py, &self))),
+ RefCell::new(self.leak_immutable(py).iter()),
+ )
}
def __contains__(&self, item: PyObject) -> PyResult<bool> {
Ok(self
- .dirs_map(py)
+ .inner(py)
.borrow()
.contains_key(item.extract::<PyBytes>(py)?.data(py).as_ref()))
}
});
+
+py_shared_ref!(Dirs, DirsMultiset, inner, DirsMultisetLeakedRef);
+
+impl Dirs {
+ pub fn from_inner(py: Python, d: DirsMultiset) -> PyResult<Self> {
+ Self::create_instance(py, RefCell::new(d), RefCell::new(0))
+ }
+
+ fn translate_key(
+ py: Python,
+ res: (&Vec<u8>, &u32),
+ ) -> PyResult<Option<PyBytes>> {
+ Ok(Some(PyBytes::new(py, res.0)))
+ }
+}
+
+py_shared_mapping_iterator!(
+ DirsMultisetKeysIterator,
+ DirsMultisetLeakedRef,
+ Iter,
+ Vec<u8>,
+ u32,
+ Dirs::translate_key,
+ Option<PyBytes>
+);
To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
More information about the Mercurial-devel
mailing list