D7178: [RFC] rust-matchers: add `Matcher` trait and implement `AlwaysMatcher`

martinvonz (Martin von Zweigbergk) phabricator at mercurial-scm.org
Wed Oct 30 03:12:51 EDT 2019


martinvonz added inline comments.
martinvonz added subscribers: spectral, martinvonz.

INLINE COMMENTS

> matchers.rs:19-22
> +    Recursive,
> +    Empty,
> +    Set(HashSet<HgPathBuf>),  // Should we implement a `NonEmptyHashSet`?
> +    This,

Could you add a comment explaining what each value means? (I *think* `Recursive` means to match all files recursively here, `This` means to visit this directory and all subdirectories, `Empty` means to not visit any subdirectories, and `Set` means to visit exactly that set.)

> matchers.rs:20-21
> +    Recursive,
> +    Empty,
> +    Set(HashSet<HgPathBuf>),  // Should we implement a `NonEmptyHashSet`?
> +    This,

Is `Empty` needed as an optimization? Or could we just use an empty set?

> matchers.rs:25
> +
> +pub trait Matcher {
> +    /// Returns whether `filename` is in `file_set`

It may be better to remove everything we're not sure we'll need to start with. That way it's easier for you to add new matchers and we won't have to clean up later if we never end up needing some of these things. Things I think we can remove to start with:

`roots()`
`prefix()`
`anypats()`

> matchers.rs:27
> +    /// Returns whether `filename` is in `file_set`
> +    fn exact_match(&self, _filename: impl AsRef<HgPath>) -> bool;
> +    fn file_set(&self) -> Option<HashSet<&HgPath>> {

Drop the `_` prefix from `_filename`?

> matchers.rs:28
> +    fn exact_match(&self, _filename: impl AsRef<HgPath>) -> bool;
> +    fn file_set(&self) -> Option<HashSet<&HgPath>> {
> +        None

`Option<CollectionType<T>>` is usually suspicious, IMO. Can we just return an empty set instead of `None`?

> matchers.rs:32
> +    fn roots(&self) -> Option<HashSet<&HgPath>> {
> +        None
> +    }

I think `{''}` (in Python syntax) makes more sense here.

> matchers.rs:35
> +    /// Returns whether `filename` is matched by this matcher
> +    fn match_fn(&self, _filename: impl AsRef<HgPath>) -> bool {
> +        true

Drop the `_` prefix here too. Does the compiler really warn about unused arguments even in trait definitions?

> matchers.rs:35
> +    /// Returns whether `filename` is matched by this matcher
> +    fn match_fn(&self, _filename: impl AsRef<HgPath>) -> bool {
> +        true

Call it `matches()` instead?

> matchers.rs:36
> +    fn match_fn(&self, _filename: impl AsRef<HgPath>) -> bool {
> +        true
> +    }

Default to `false` like the Python version does?

> matchers.rs:39-49
> +    /// Decides whether a directory should be visited based on whether it
> +    /// has potential matches in it or one of its subdirectories. This is
> +    /// based on the match's primary, included, and excluded patterns.
> +    ///
> +    /// Returns `VisitDir::Recursive` if the given directory and all
> +    /// subdirectories should be visited.
> +    /// Otherwise returns `VisitDir::This` or `VisitDir::No` indicating whether

Maybe we can try without this one? It doesn't seem like we need `visit_dir()` now that we have `visit_children_set()` and it seems to have a superset of the functionality. I asked @spectral about that and they thought we should only need the latter, except that it may be slower in some cases, since it needs to return a set of children to visit. However, it seems (to me) that that cost should be made up for by the speed gained by not visiting uninteresting subdirectories. So I would suggest not adding `visit_dir()` to start with and we can add it later if it seems useful in some case. We should also see if we can switch over existing uses of `visit_dir()` to use `visit_children_set()` in the Python code.

> matchers.rs:104
> +    /// optimization might be possible.
> +    fn always(&self) -> bool {
> +        false

Should this be `is_always()` per Rust naming conventions?

> matchers.rs:112-122
> +    /// Matcher will match the paths in `files_set()` recursively: optimization
> +    /// might be possible
> +    fn prefix(&self) -> bool {
> +        false
> +    }
> +    /// None of `.always()`, `.is_exact()`, and `.prefix()` is true:
> +    /// optimizations will be difficult.

I think `is_always()` and `is_exact()` are very often useful for optimizations, but I'm not sure about these two.

> matchers.rs:114
> +    /// might be possible
> +    fn prefix(&self) -> bool {
> +        false

Similarly, should this be `is_prefix()`?

> matchers.rs:119
> +    /// optimizations will be difficult.
> +    fn any_pats(&self) -> bool {
> +        // TODO rename? It's confusing

And here

> matchers.rs:130-132
> +    fn exact_match(&self, _filename: impl AsRef<HgPath>) -> bool {
> +        false
> +    }

Seems like a good default implementation. Put this in the trait instead?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7178/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7178

To: Alphare, #hg-reviewers
Cc: martinvonz, spectral, durin42, kevincox, mercurial-devel


More information about the Mercurial-devel mailing list