D7796: rust-nodemap: input/output primitives

gracinet (Georges Racinet) phabricator at mercurial-scm.org
Mon Jan 27 11:07:08 EST 2020


gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:268
> I thought about this more and I think I am okay doing it this way. It seems like this should be well defined as long as there is no padding. However on that note I would want to add a check that there is no padding as expected. I would also like to ensure that this fails to compile if there is ever padding, even in release mode. I think this can be accomplished by something like:
> 
>   let _: [u8; 4 * BLOCK_SIZE] = std::mem::transmute([Block::new(); 4]);
> 
> I would probably want to repeat this check near any code that is relying on this invariant.

About a method to output to a writer: for the time being, we're avoiding doing I/O directly in Rust because most of it is shared with Python through layers that perform various sanitizations, lock management etc. That's why the simplest is to transfer bytes out.

Context note: the data structure (`Block`) is directly taken from the C version, which is non-persistent, but I believe that these 64 bytes are just some low-level programmer reflex. It's probably not a coincidence that IIRC that's also the length of cache lines on most current CPUs.

Anyway, the point of all of this is to get on disk without padding, so yes, we could implement our own non-padding by changing the definition of `Block` to `[u8; 64]`. In the end, we are forced to trust what's on disk anyway.

> kevincox wrote in nodemap.rs:272
> If someone panics between the `from_raw_parts` and `mem::forget` this is a double free. Right now I think this can't happen but it is completely possible that the code changes between then and now. I would prefer using `Vec::from_raw_parts` to make sure that there is only one `Vec` alive with that pointer at a time. This risks a leak in the face of panic but I think that is much better.

Not sure to understand what you suggest here, since I'm already using `Vec::from_raw_parts` in that method. I couldn't find an `into_raw_parts`.
Anyway, the official example <https://doc.rust-lang.org/std/vec/struct.Vec.html#examples-4>  has the `mem::forget` before  `Vec::from_raw_parts`. Would that be sufficient?

I agree that a leak upon some exceptional-cannot-happen condition is better than a double free.

Also, forgot to came back to that one in the latest phab update

> kevincox wrote in nodemap.rs:433
> This is a weird API. Why does new take a buffer and an offset? Can it take either a slice or just remove the offset parameter and always have the buffer start at 0? It is very strange to be passing in data that we own but isn't ever used.

Owning some memory and not using the whole of it is anyway what happens when that memory region is just obtained from a mmap (which is the whole point of carrying theses `Deref` all around).
Technically, in a mmap, I suppose we actually only own the file descriptor and the resulting pointer, but we can't just convert it to an inert slice.

Anyway it's now confirmed that we won't be needing the offset finally, so I've removed it.  Some data at the end of the bytes slice may be ignored, too : it would be the result of aborted transactions.

Note: the Python layer will maintain the amount of validated blocks in a separate small file which is updated atomically. Any future reimplementation of this in Rust would have to do the same.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel


More information about the Mercurial-devel mailing list