D6271: rust-filepatterns: add a Rust implementation of pattern-related utils

kevincox (Kevin Cox) phabricator at mercurial-scm.org
Sat Apr 20 16:22:26 EDT 2019


kevincox requested changes to this revision.
kevincox added inline comments.
This revision now requires changes to proceed.

INLINE COMMENTS

> filepatterns.rs:18
> +    };
> +}
> +

Why not use https://docs.rs/regex/1.1.6/regex/fn.escape.html? If you are worried about performance you can use https://docs.rs/regex-syntax/0.6.6/regex_syntax/fn.escape_into.html which is quite fast.

> filepatterns.rs:21
> +const GLOB_REPLACEMENTS: &[(&[u8], &[u8])] =
> +    &[(b"*/", b"(?:.*/)?"), (b"*", b".*"), (b"", b"[^/]*")];
> +

It is worth putting a comment that these are matched in order.

> filepatterns.rs:29
> +    Glob,
> +    // Glob that matches at any suffix of the path (still anchored at slashes)
> +    Path,

Put doc comments before the thing that they are documenting and use `///`

> filepatterns.rs:51
> +                        input = &input[source.len()..];
> +                        res.extend(repl.iter());
> +                        break;

I suspect this `.iter()` isn't required.

> filepatterns.rs:61
> +                    .enumerate()
> +                    .position(|(i, b)| *b == b']' && i != 0)
> +                {

input.iter().skip(1).position(|b| *b == b']')

> filepatterns.rs:226
> +
> +pub fn parse_pattern_file_contents(
> +    lines: &str,

This function appears to be way more complicated then it needs to be. I have tried to highlight the changes in comments below but I have a full version at https://rust.godbolt.org/z/08MrDP. I haven't actually run the code but I think my version has the same functionality as this one.

> filepatterns.rs:239
> +        let line_number = line_number + 1;
> +        let mut line = line.to_string();
> +        let mut syntax = current_syntax.to_string();

If you are worried about performance you can use a buffer outside the loop and replace it with `line.replace_range(.., line_str)`. This saves continuously reallocating the string.

> filepatterns.rs:240
> +        let mut line = line.to_string();
> +        let mut syntax = current_syntax.to_string();
> +        let mut syntax = syntax.as_ref();

This copy is unnecessary and having the lifetime start so early is confusing. The only time this is used is as a temporary in the `syntax:` case and to set `line_syntax` where it is always unchanged from the default. This variable should just be removed.

> filepatterns.rs:241
> +        let mut syntax = current_syntax.to_string();
> +        let mut syntax = syntax.as_ref();
> +

This is also unnecessary. Because of the above comment.

> filepatterns.rs:249
> +        }
> +        line = str::trim_end(line.as_ref()).to_string();
> +

You can just call `str` methods on `String` because they implement `Deref<str>`.

  line = line.as_ref().trim_end().to_string();

> filepatterns.rs:259
> +            if let Some(rel_syntax) = SYNTAXES.get(syntax) {
> +                current_syntax = Cow::Owned(rel_syntax.to_string());;
> +            } else if warn {

`rel_syntax: &'static str` so this can be `Cow::Borrowed`. Actually `current_syntax` is only ever set to static lifetime strings so you don't have to use `Cow` at all.

> filepatterns.rs:267
> +        let mut line_syntax = syntax;
> +        let mut final_line = line.clone();
> +

This clone is unnecessary it is always a reference to line from herein so you can just use `&str`.

> filepatterns.rs:282
> +        inputs.push((
> +            line_syntax.to_string() + &final_line,
> +            line_number,

Slight preference for `format!()`. In theory it can do clever things like checking argument sizes to only allocate once. Either way I think it looks nice.

> filepatterns.rs:286
> +        ));
> +        current_syntax = Cow::Owned(syntax.to_string());
> +    }

I'm pretty sure this is redundant because the only time `syntax` is modified is in the `syntax:` case which A) Sets `current_syntax` and B) `continue`s.

> filepatterns.rs:299
> +        Ok(parse_pattern_file_contents(&contents, &file_path, warn))
> +    })?
> +}

Mapping to an always-ok result is odd. You should just map to the `parse_pattern_file_contents()`. Then you don't need to use `?` to unwrap the extra layer. However you then loose the error conversion which you can add back with `.map_err(|e| e.into())`.

https://rust.godbolt.org/z/Mflpga

Even better is just use the try for the read and call `parse_pattern_file_contents` at the function level.

  f.read_to_string(&mut contents)?;
  Ok(parse_pattern_file_contents(&contents, &file_path, warn))

> lib.rs:47
>  
> +pub type LineNumber = usize;
> +

This seems like overkill to me. I would either remove it or use `pub struct LineNumber(pub usize)` if you want to do this for type safety. However I don't think type safety is really needed here so I would just remove the alias.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6271

To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel


More information about the Mercurial-devel mailing list