D2057: rust implementation of hg status

indygreg (Gregory Szorc) phabricator at mercurial-scm.org
Thu Mar 8 02:31:25 EST 2018


indygreg added a comment.


  First of all - wow! Thanks for writing all this code. There's definitely a lot to work with. And work with it we will!
  
  This is definitely too big to review as one commit. If you could do *any* work to split it up, it would be greatly appreciated. I'd focus on the pure Rust pieces first. Everything needed to open revlogs would be great!
  
  You may find our custom Phabricator extensions (linked from https://www.mercurial-scm.org/wiki/Phabricator) useful for submitting series of commits to Phabricator.
  
  Regarding the performance, that's pretty good! The dirstate code is some of the most optimized code in Mercurial. There are some gnarly Python C hacks to make it fast. Some of those tricks involve using special system calls to walk directories to minimize the number of system calls. I'm not sure if the crate you imported has those optimizations. (I wouldn't be surprised either way.) I wouldn't worry too much about performance at this juncture. But I suspect we could make the Rust code another 50% faster with some tweaking. It would also be interesting to test on a larger repo, say https://hg.mozilla.org/mozilla-unified. Also, I believe there are hooks in the dirstate code to use Watchman (fsmonitor). Those hooks are critical in order to achieve peak performance on large repositories.
  
  Since you seem to be proficient at writing lots of Rust code, if you are looking for another project, may I suggest porting `chg` to Rust? That code is in `contrib/chg`. That might be the easiest component to actually ship in Rust since it is a standalone binary that doesn't link against Python. But we shouldn't get ahead of ourselves :)
  
  Anyway, it is late for me and I need to detach from my computer. I'm sure others will have things to say as well...

INLINE COMMENTS

> build.rs:1
> +// build.rs -- Configure build environment for `hgcli` Rust package.
> +//

I see this file was copied. There's nothing wrong with that. But does this mean we will need a custom build.rs for each Rust package doing Python? If that's the case, then I would prefer to isolate all our rust-cpython code to a single package, if possible. I'm guessing that could be challenging due to crossing create boundaries. I'm sure there are placed where we don't want to expose symbols outside the crate.

I'm curious how others feel about this.

> main.rs:233-261
> +    let matches = clap::App::new("hg rust oxidation")
> +        .arg(
> +            clap::Arg::with_name("repository")
> +                .short("c")
> +                .long("repository")
> +                .value_name("dash_r"),
> +        )

This is definitely nifty and an impressive achievement \o/

The `r-` commands for testing pure Rust code paths are an interesting idea!

I think I'm OK with including support for this in `hgcli`. But I think the code should live in a separate file so it doesn't pollute `main()`. And it should be behind a Cargo feature flag so we maintain compatibility with `hg` as much as possible by default.

Also, Mercurial's command line parser is extremely wonky and has some questionable behavior. If the intent is to make `rhg` compatible with `hg`, we would need to preserve this horrible behavior. We'll likely have to write a custom argument parser because of how quirky Mercurial's argument parser is :(

> config.rs:95
> +        } else {
> +            RevlogFormat::V0
> +        }

I would not worry about supporting v0 or v2 at this time. v0 is only important for backwards compatibility with ancient repos. And v2 never got off the ground.

> revlog_v1.rs:279
> +        let mut fhandle = BufReader::new(match self.dflag {
> +            Inline => fs::File::open(self.index_file.as_path()).unwrap(),
> +            Separated(ref dfile) => fs::File::open(dfile).unwrap(),

IIRC, core Mercurial keeps an open file handle on revlogs and ensures we don't run out of file handles by not keeping too many revlogs open at the same time. For scanning operations, not having to open and close the file handles all the time will make a difference for performance. Also, core Mercurial loads the entirety of the `.i` file into memory. That's a scaling problem for large revlogs. But it does make performance of index lookups really fast.

> revlog_v1.rs:290-293
> +        while let Some(ref chld_r) = it.next() {
> +            if let Some(bin) = self.get_content(&mut fhandle, chld_r) {
> +                bins.push(bin);
> +            } else {

A thread pool to help with zlib decompression should go a long way here.

Probably too early to think about this, but we'll likely eventually want a global thread pool for doing I/O and CPU expensive tasks, such as reading chunks from a revlog and decompressing them.

FWIW, we're going to radically alter the storage format in order to better support shallow clones. But that work hasn't started yet. I still think there is a benefit to implementing the revlog code in Rust though.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: krbullock, indygreg, durin42, kevincox, mercurial-devel


More information about the Mercurial-devel mailing list