D6271: rust-filepatterns: add a Rust implementation of pattern-related utils
kevincox (Kevin Cox)
phabricator at mercurial-scm.org
Sat Apr 20 16:22:26 EDT 2019
kevincox requested changes to this revision.
kevincox added inline comments.
This revision now requires changes to proceed.
INLINE COMMENTS
> filepatterns.rs:18
> + };
> +}
> +
Why not use https://docs.rs/regex/1.1.6/regex/fn.escape.html? If you are worried about performance you can use https://docs.rs/regex-syntax/0.6.6/regex_syntax/fn.escape_into.html which is quite fast.
> filepatterns.rs:21
> +const GLOB_REPLACEMENTS: &[(&[u8], &[u8])] =
> + &[(b"*/", b"(?:.*/)?"), (b"*", b".*"), (b"", b"[^/]*")];
> +
It is worth putting a comment that these are matched in order.
> filepatterns.rs:29
> + Glob,
> + // Glob that matches at any suffix of the path (still anchored at slashes)
> + Path,
Put doc comments before the thing that they are documenting and use `///`
> filepatterns.rs:51
> + input = &input[source.len()..];
> + res.extend(repl.iter());
> + break;
I suspect this `.iter()` isn't required.
> filepatterns.rs:61
> + .enumerate()
> + .position(|(i, b)| *b == b']' && i != 0)
> + {
input.iter().skip(1).position(|b| *b == b']')
> filepatterns.rs:226
> +
> +pub fn parse_pattern_file_contents(
> + lines: &str,
This function appears to be way more complicated then it needs to be. I have tried to highlight the changes in comments below but I have a full version at https://rust.godbolt.org/z/08MrDP. I haven't actually run the code but I think my version has the same functionality as this one.
> filepatterns.rs:239
> + let line_number = line_number + 1;
> + let mut line = line.to_string();
> + let mut syntax = current_syntax.to_string();
If you are worried about performance you can use a buffer outside the loop and replace it with `line.replace_range(.., line_str)`. This saves continuously reallocating the string.
> filepatterns.rs:240
> + let mut line = line.to_string();
> + let mut syntax = current_syntax.to_string();
> + let mut syntax = syntax.as_ref();
This copy is unnecessary and having the lifetime start so early is confusing. The only time this is used is as a temporary in the `syntax:` case and to set `line_syntax` where it is always unchanged from the default. This variable should just be removed.
> filepatterns.rs:241
> + let mut syntax = current_syntax.to_string();
> + let mut syntax = syntax.as_ref();
> +
This is also unnecessary. Because of the above comment.
> filepatterns.rs:249
> + }
> + line = str::trim_end(line.as_ref()).to_string();
> +
You can just call `str` methods on `String` because they implement `Deref<str>`.
line = line.as_ref().trim_end().to_string();
> filepatterns.rs:259
> + if let Some(rel_syntax) = SYNTAXES.get(syntax) {
> + current_syntax = Cow::Owned(rel_syntax.to_string());;
> + } else if warn {
`rel_syntax: &'static str` so this can be `Cow::Borrowed`. Actually `current_syntax` is only ever set to static lifetime strings so you don't have to use `Cow` at all.
> filepatterns.rs:267
> + let mut line_syntax = syntax;
> + let mut final_line = line.clone();
> +
This clone is unnecessary it is always a reference to line from herein so you can just use `&str`.
> filepatterns.rs:282
> + inputs.push((
> + line_syntax.to_string() + &final_line,
> + line_number,
Slight preference for `format!()`. In theory it can do clever things like checking argument sizes to only allocate once. Either way I think it looks nice.
> filepatterns.rs:286
> + ));
> + current_syntax = Cow::Owned(syntax.to_string());
> + }
I'm pretty sure this is redundant because the only time `syntax` is modified is in the `syntax:` case which A) Sets `current_syntax` and B) `continue`s.
> filepatterns.rs:299
> + Ok(parse_pattern_file_contents(&contents, &file_path, warn))
> + })?
> +}
Mapping to an always-ok result is odd. You should just map to the `parse_pattern_file_contents()`. Then you don't need to use `?` to unwrap the extra layer. However you then loose the error conversion which you can add back with `.map_err(|e| e.into())`.
https://rust.godbolt.org/z/Mflpga
Even better is just use the try for the read and call `parse_pattern_file_contents` at the function level.
f.read_to_string(&mut contents)?;
Ok(parse_pattern_file_contents(&contents, &file_path, warn))
> lib.rs:47
>
> +pub type LineNumber = usize;
> +
This seems like overkill to me. I would either remove it or use `pub struct LineNumber(pub usize)` if you want to do this for type safety. However I don't think type safety is really needed here so I would just remove the alias.
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D6271
To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
More information about the Mercurial-devel
mailing list