D7058: rust-dirstate-status: add first Rust implementation of `dirstate.status`

martinvonz (Martin von Zweigbergk) phabricator at mercurial-scm.org
Fri Oct 11 20:22:21 EDT 2019


martinvonz added a comment.


  I've slowly worked my way through `walk_explicit()` so far, but I had a few comments there, so I'll let you take a look at those first while I take a break.

INLINE COMMENTS

> status.rs:22
> +/// TODO subrepos
> +pub fn walk_explicit<P: AsRef<Path>>(
> +    files: Vec<HgPathBuf>,

Does this need to be public?

> status.rs:22-28
> +pub fn walk_explicit<P: AsRef<Path>>(
> +    files: Vec<HgPathBuf>,
> +    dmap: &mut DirstateMap,
> +    root_dir: P,
> +) -> std::io::Result<HashMap<HgPathBuf, Option<HgMetadata>>>
> +where
> +    P: Sync,

I'm completely new to Rust (just finished going through the tutorial), so please excuse naive questions...

What's the advantage of splitting up the definition of `P` like this? Why not constrain it as `AsRef<Path> + Sync` either on line 22 or line 28?

> status.rs:33
> +    // Sentinel value to prevent subrepo walks.
> +    results.insert(HgPathBuf::from_bytes(b".hg"), None);
> +

Looks like this will end up getting returned. Do we want that? It seems surprising to me.

> status.rs:36-38
> +        let mut v = Vec::new();
> +        v.push(HgPathBuf::new());
> +        v

Isn't this the same as `vec!(HgPathBuf::new())`? Do we dislike the `vec!` macro? Or macros in general?

> status.rs:36-38
> +        let mut v = Vec::new();
> +        v.push(HgPathBuf::new());
> +        v

What is this case about? If no files are specified, we want to return stat information for the (repo) root directory? That seems surprising. Does the caller depend on that? Does it have to?

> status.rs:40
> +    } else {
> +        files.to_vec()
> +    };

It already is a vector, so no need to call `.to_vec()` here?

> status.rs:43
> +
> +    let stats_res: std::io::Result<
> +        Vec<Option<(&HgPathBuf, &HgPathBuf, std::io::Result<Metadata>)>>,

Please add a comment explaining what the type is. What does the outer `Result` mean? What does the `Option` mean? What are the two `HgPathBuf`s? What does the inner `Result` mean?

> status.rs:44
> +    let stats_res: std::io::Result<
> +        Vec<Option<(&HgPathBuf, &HgPathBuf, std::io::Result<Metadata>)>>,
> +    > = files

Do we want to keep the trailing comma?

> status.rs:54-55
> +            }
> +            let target_filename =
> +                root_dir.as_ref().join(hg_path_to_path_buf(filename)?);
> +

Do you think it would be easier to read and reason about this whole function if we did up to here in one step and the `symlink_metadata()` in a separate step after? By "step", I mean either in a separate `.map()` or in a separate `par_iter()`. I think it might be easier to follow if any errors from `hg_path_to_path_buf()` were checked earlier.

> status.rs:55
> +            let target_filename =
> +                root_dir.as_ref().join(hg_path_to_path_buf(filename)?);
> +

It doesn't matter yet, but should this be `normalized` instead of `filename`?

> status.rs:63
> +        })
> +        .collect();
> +

https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect talks about using a "turbofish". I don't know how specific you *want* to be about the type here (do you want all the details on line 44? If not, you can do `.collect::<std::io::Result<Vec<_>>>();` here instead.

> status.rs:65
> +
> +    for res in stats_res? {
> +        match res {

The tutorial seemed to put the `?` where the expression was defined, i.e. at the end of line 63 here.  Seems generally better to me so the variable gets a simpler type and so it's only evaluated once even if the variable is used in multiple places. (Maybe changing it requires the turbofish?)

> status.rs:66
> +    for res in stats_res? {
> +        match res {
> +            Some((normalized, _, Ok(stat))) => {

Are we going to use the second value (the unnormalized filename) later? Otherwise I'd prefer to remove it from the tuple. Actually, I'd prefer to do that now anyway unless you already have patches that start using it.

> status.rs:78-80
> +                    if dmap.contains_key(normalized) {
> +                        results.insert(normalized.to_owned(), None);
> +                    }

This looks just like the `is_dir()` case. Should we merge them (i.e. just check `if stat.is_file() { ... } else { ... }`)?

> status.rs:83
> +            }
> +            None => {}
> +            Some((nf, _, Err(_))) => {

Are we going to do more here later? If not, we can just drop the `Option`-wrapping in `stats_res` (perhaps by adding a `.filter()` step to the iteration).

> status.rs:84
> +            None => {}
> +            Some((nf, _, Err(_))) => {
> +                if dmap.contains_key(nf) {

Are we going to care about the specific error type/message soon? Otherwise, could we replace it by an `Option`?

> status.rs:84
> +            None => {}
> +            Some((nf, _, Err(_))) => {
> +                if dmap.contains_key(nf) {

Call it `normalized` here too for consistency (and `normalized` is clearer than `nf`)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7058/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7058

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel


More information about the Mercurial-devel mailing list