D7866: rust-pathauditor: add Rust implementation of the `pathauditor`

martinvonz (Martin von Zweigbergk) phabricator at mercurial-scm.org
Thu Jan 23 13:16:18 EST 2020


This revision now requires changes to proceed.
martinvonz added inline comments.
martinvonz requested changes to this revision.
martinvonz added subscribers: yuja, martinvonz.

INLINE COMMENTS

> path_auditor.rs:55
> +            .collect();
> +        if !path.split_drive().0.is_empty()
> +            || [&b".hg"[..], &b".hg."[..], &b""[..]]

As @yuja pointed out on D7864 <https://phab.mercurial-scm.org/D7864>, it's weird to have `split_drive()` on `HgPath`, because that is supposed to be a repo-relative path. I think it would be better to move it into this file (or elsewhere). Can be done in a follow-up.

> path_auditor.rs:70
> +                let mut first = split.next().unwrap().to_owned();
> +                first.make_ascii_uppercase();
> +                let last = split.next().unwrap();

nit: chain `to_ascii_uppercase()` onto the previous line (and drop `mut`) instead

> path_auditor.rs:72
> +                let last = split.next().unwrap();
> +                if last.iter().all(|b| b.is_ascii_digit())
> +                    && [&b"HG"[..], &b"HG8B6C"[..]].contains(&first.as_ref())

or is `u8::is_ascii_digit` preferred? (just a question; i don't know which is preferred)

> path_auditor.rs:73
> +                if last.iter().all(|b| b.is_ascii_digit())
> +                    && [&b"HG"[..], &b"HG8B6C"[..]].contains(&first.as_ref())
> +                {

Is this line equivalent to `&& (first == b"HG" || first == b"HG8B6C")`? If so, I would strongly prefer that.

Same comment on line 56, actually (I reviewed out of of order). Would probably be helpful to extract `&lower_clean(parts[0]).as_ref()` to a variable anyway.

> path_auditor.rs:83-84
> +        {
> +            let lower_parts: Vec<_> =
> +                parts.iter().map(|p| lower_clean(p)).collect();
> +            for pattern in [b".hg".to_vec(), b".hg.".to_vec()].iter() {

Would we ever get different results from `lower_clean(path).split()` and `split(path).map(lower_clean)`? If not, shouldn't we reuse the `lower_clean()`ed result from above and call `split()` on that?

> path_auditor.rs:86
> +            for pattern in [b".hg".to_vec(), b".hg.".to_vec()].iter() {
> +                if lower_parts[1..].contains(pattern) {
> +                    let pos = lower_parts

What's the `1` about? It seems to say it's okay if the path has `.hg` as its first component. Why?

> path_auditor.rs:87-90
> +                    let pos = lower_parts
> +                        .iter()
> +                        .position(|part| part == pattern)
> +                        .unwrap();

Can we eliminate this by using `.enumerate()` on line 85?

> path_auditor.rs:91-95
> +                    let base = lower_parts[..pos]
> +                        .iter()
> +                        .fold(HgPathBuf::new(), |acc, p| {
> +                            acc.join(HgPath::new(p))
> +                        });

Will it be confusing that this is not necessarily a prefix of `path`? Specifically, it seems like it's going to be lower-case and with possibly different path separators.

> path_auditor.rs:114
> +                &parts[..index + 1].join(&(std::path::MAIN_SEPARATOR as u8));
> +            let prefix = HgPath::new(prefix);
> +            if self.audited_dirs.contains(prefix) {

Should this be a `PathBuf`? It's weird to have a `HgPath` containing `std::path::MAIN_SEPARATOR`.

> path_auditor.rs:123-124
> +        self.audited.insert(path.to_owned());
> +        // Only add prefixes to the cache after checking everything: we don't
> +        // want to add "foo/bar/baz" before checking if there's a "foo/.hg"
> +        self.audited_dirs.extend(prefixes);

It looks like we check the prefixes in order, so first "foo", then "foo/bar", then "foo/bar/baz". We'd return early if we find "foo/.hg", before we get to "foo/bar", no? What am I missing?

> path_auditor.rs:144
> +                // EINVAL can be raised as invalid path syntax under win32.
> +                // They must be ignored for patterns can be checked too.
> +                if e.kind() != std::io::ErrorKind::NotFound

Could you replace "for" by "because" if that's what it means here. Or fix the sentence if that's not what you mean.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7866/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7866

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, yuja, durin42, kevincox, mercurial-devel


More information about the Mercurial-devel mailing list