[PATCH] match: adding non-recursive directory matching

Martin von Zweigbergk martinvonz at google.com
Mon Nov 7 22:38:00 UTC 2016


On Wed, Oct 26, 2016 at 2:03 PM Rodrigo Damazio via Mercurial-devel <
mercurial-devel at mercurial-scm.org> wrote:

> On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <
> foozy at lares.dti.ne.jp> wrote:
>
>
> At Tue, 25 Oct 2016 19:51:59 -0700,
> Rodrigo Damazio wrote:
> >
> > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> foozy at lares.dti.ne.jp>
> > wrote:
> >
> > >
> > > At Mon, 24 Oct 2016 10:34:52 -0700,
> > > Rodrigo Damazio wrote:
> > > >
> > > > [1  <text/plain; UTF-8 (7bit)>]
> > > > It sounds like we'd like to do 3 somewhat orthogonal things:
> > > > - allow user to specify the directory the pattern is relative to
> > > > (root/cwd/any)
> > > > - allow the user to specify recursiveness/non-recursiveness
> consistently
> > > > (not covered by the *path patterns, but could be the defined
> behavior for
> > > > the globs)
> > > > - clean up the matcher API (discussed during Sprint)
> > > >
> > > > Doing all 3 together would probably take some time and a lot of
> > > > back-and-forth, so I'm wondering if it'd be ok to start by updating
> this
> > > > patch to implement "rootglob" with consistent recursiveness
> behavior, and
> > > > we can then more slowly add the other patterns and converge on a
> cleaner
> > > > API?
>
>
I'm obviously biased by working on the same project as you, but starting
with rootglob makes sense to me. The matcher API cleanup, whatever that is,
will probably be insignificantly harder because of rootglob. Even if we add
all nine (or more?) suggested patterns suggested by Foozy, I don't think it
will matter much for the refactoring.

However, I think the rootglob pattern has more impact to our users than it
does to our codebase, so what we may want to do now is to document it
better. I haven't thought much about it, but your patch didn't seem to
include any documentation. I'm thinking that one of the tables in this
thread should be in `hg help patterns` (i.e. mercurial/help/patterns.txt)
and we can perhaps think about how we want that text to look once we add
the other patterns Foozy suggested.

What do others think?



> > >
> > > (let's suspend posting revised series while code freeze period, to
> > > focus on stabilization :-))
> > >
> >
> > Sure, I understand you're under the freeze. Feel free to prioritize
> > reviewing my patches appropriately.
> > (notice the new patch is based on default, not stable)
> >
> >     https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan#Code_Freeze
> > >
> > > In my previous reply, I assume that newly introduced syntaxes do:
> > >
> > >   - match recursively by default regardless of the way of passing
> > >     (command line, -I/-X, ....), because of similarity with almost all
> > >     of existing syntaxes
> > >
> > >     Only glob/relglob as PATTERN in command line require "end of name"
> > >     matching.
> > >
> > >   - require additional "-eon" ("end of name") suffix for non-recursive
> > >     matching (e.g. "rootglob-eon", "cwdre-eon", "anypath-eon", ...)
> > >
> > > But according to your revised patch, "rootglob" syntax matches
> > > non-recursively. Would you assume as below ?
> > >
> > >   - newly introduced syntaxes match non-recursively by default
> > >   - recursive matching requires any additional suffix (e.g.
> "-recursive")
> > >
> >
> > Ah, the assumption is slightly different - the assumption is that for
> glob
> > types, specifically, we're doing a full match, so that to get
> recursiveness
> > at the end the user should specify /** or similar. This allows the user
> to
> > do recursive or non-recursive matching by using * or ** as appropriate.
> > I'll suggest that regex types also do a full match, and the user can end
> > them with .* if they want it to be a prefix.
> > I believe this is simpler and more flexible than having 18 different
> > pattern types just to account for the different behavior of the
> matching. I
> > considered that we could, likewise, make partial matching be the default,
> > but I decided against that when making the patch because then it'd be
> > impossible to make them non-recursive by a modifier, without doubling the
> > number of matchers as you suggested.
>
> It is right that glob and re pattern can switch
> recursive/non-recursive by its own pattern, and controlling
> recursive-ness by extra suffix of syntax name is redundant for them.
>
> I also forgot that adding "(?:$)" or "(?:$|/)" to "re:" pattern
> correctly according to recursive-ness might cause trouble for
> complicated regexp :-<
>
>
>
> > On the other hand, you assume that newly introduced *path syntaxes
> > > will be recursive, as below. Would you assume that default
> > > recursive-ness is different between *glob and *path syntaxes ?
> > >
> >
> > path would be recursive, as will glob that ends with ** or regex that
> ends
> > with .*
> >
> >
> > > > Also, for discussion: I assume the *path patterns will be recursive
> when
> > > > they reference a directory. Do we also want a non-recursive
> equivalent
> > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
>
> How about adding syntax type "file"/"dir" ?
>
>   ===== ============= =================
>   type  for recursive for non-recursive
>   ===== ============= =================
>   glob  use "**"      use "*"
>   re    omit "$"      append "$"
>   path  always(*1)    ----
>   file  ----          always
>   dir   always(*2)    ----
>   ===== ============= =================
>
>   (*1) match against both file and directory
>   (*2) match against only directory
>
> "dir" might be overkill, though :-) (is it useful in resolving name
> collision at merging or so ?)
>
>
> foozy, thanks so much for the review and discussion.
> Sounds like we do agree about the glob behavior then, so let me know if
> you'd like any changes to the latest version of this patch, other than
> improving documentation. I'm happy to send an updated version as soon as
> someone is ready to review.
>
> I understand the difference between dir and path (and between the original
> version of this patch and file) would be that they'd validate the type of
> entry being matched (so that passing a filename to dir or dir name to file
> would be an error) - is that what you have in mind? The current matchers
> don't have a good mechanism to verify the type, so some significant
> rewiring would need to be done to pass that information down.
> Another thought is that by supporting file and dir, you're incentivizing
> developers to rely on smarter name collision support (and also case
> collisions) - one could argue that there's no reason for the complexity
> caused by that.
>
>
> > >
> > > IMHO, making patch description explain how recursive matching will be
> > > controlled in the future helps reviewers to evaluate your patch.
> > >
> >
> > I'm happy to update the documentation on my patch to better reflect the
> > full-matching characteristic, if it's OK to push a new version of the
> patch
> > :)
> >
> >
> > > BTW, bikeshedding about name of additional suffix:
> > >
> > >   - for non-recursive matching, in "recursive matching by default" case
> > >
> > >     - "-eon"
> > >
> > >       "end of name matching" is my coined word only for explanation,
> > >       and let's choose better one :-)
> > >
> > >     - "-exact" for non-recursive matching
> > >
> > >       this might confuse developers, because current implementation
> > >       already uses "exact" term as "matching without any special
> > >       handling".
> > >
> > >         https://selenic.com/repo/hg/file/438173c41587/mercurial/
> > > match.py#l100
> > >
> > >     - "-nonrecursive"
> > >
> > >       this is too long, isn't it ?
> > >
> > >     - "-file"
> > >
> > >       this seems better (short and understandable for end users)
> > >
> > >   - for recursive matching, in "non-recursive matching by default" case
> > >
> > >     - "-recursive"
> > >
> > >       this is too long, isn't it ?
> > >
> > >     - "-dir"
> > >
> > >       this seems better (short and understandable for end users)
> >
> >
> > > > Thanks
> > > > Rodrigo
> > > >
> > > >
> > > >
> > > > On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> > > > pierre-yves.david at ens-lyon.org> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> > > > >
> > > > >> At Tue, 18 Oct 2016 10:12:07 -0400,
> > > > >> Augie Fackler wrote:
> > > > >>
> > > > >>>
> > > > >>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya at tcha.org>
> > > wrote:
> > > > >>>
> > > > >>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > > > >>>>
> > > > >>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya at tcha.org>
> wrote:
> > > > >>>>>>
> > > > >>>>>>> After coordinating on irc to figure out what this proposal
> > > actually
> > > > >>>>>>> is, I've noticed that the semantics of this "exact" proposal
> are
> > > > >>>>>>> exactly what "glob" does today, which means (I think) that
> > > > >>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" -
> > > what am
> > > > >>>>>>> I missing?
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>> Maybe we want a "glob" relative to the repo root?
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> As far as I can tell, it already is. "relglob:" is relative to
> your
> > > > >>>>> location in the repo according to the docs.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Unfortunately that isn't.
> > > > >>>>
> > > > >>>>         'glob:<glob>' - a glob relative to cwd
> > > > >>>>         'relglob:<glob>' - an unrooted glob (*.c matches C
> files in
> > > all
> > > > >>>> dirs)
> > > > >>>>
> > > > >>>> Don't ask me why. ;-)
> > > > >>>>
> > > > >>>
> > > > >>> Oh wat. It looks like narrowhg might change this behavior in
> narrowed
> > > > >>> repositories, thus my additional confusion.
> > > > >>>
> > > > >>> Maybe we should add "absglob" that is always repo-root-absolute.
> How
> > > > >>> do we feel about that overall?
> > > > >>>
> > > > >>
> > > > >> FYI, current pattern matching is implemented as below. This was
> > > > >> chatted in "non-recursive directory matching" session of 4.0
> sprint,
> > > > >> and sorry for my late posting of this translation from
> > > > >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in
> Japanese,
> > > as
> > > > >> my backlog of the last sprint.
> > > > >>
> > > > >>   ============ ======= ======= ===========
> > > > >>   pattern type root-ed cwd-ed  any-of-path
> > > > >>   ============ ======= ======= ===========
> > > > >>   wildcard     ---     glob    relglob
> > > > >>   regexp       re      ---     relre
> > > > >>   raw string   path    relpath ---
> > > > >>   ============ ======= ======= ===========
> > > > >>
> > > > >>   If rule is read in from file (e.g. .hgignore):
> > > > >>
> > > > >>     * "glob" is treated as "relglob"
> > > > >>     * "re" is treated as "relre"
> > > > >>
> > > > >>   This is mentioned in "hg help patterns" and "hg help hgignore",
> but
> > > > >>   syntax name "relglob" and "relre" themselves aren't explained.
> > > > >>
> > > > >>   "end of name" matching is required:
> > > > >>
> > > > >>     * for glob/relglob as PATTERN (e.g. argument in command
> line), but
> > > > >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern
> > > syntaxes
> > > > >>
> > > > >>   For example, file "foo/bar/baz" is:
> > > > >>
> > > > >>     * not matched at "hg files glob:foo/bar"
> > > > >>     * but matched at "hg file -I glob:foo/bar"
> > > > >>
> > > > >>   This isn't mentioned in any help document :-<, and the latter
> seems
> > > > >>   to cause the issue mentioned in this patch series.
>
>
`hg help patterns` actually has the following section that I suspect is
meant to say this, although it definitely could have been made clearer:

    All patterns, except for "glob:" specified in command line (not for "-I"
    or "-X" options), can match also against directories: files under
matched
    directories are treated as matched.



> > > > >>
> > > > >> How about introducing new systematic names like below to
> re-organize
> > > > >> current complicated mapping between names and matching ? (and
> enable
> > > > >> "end of name" matching by "-eon" suffix or so)
> > > > >>
> > > > >>   ============ ======== ======= ===========
> > > > >>   pattern type root-ed  cwd-ed  any-of-path
> > > > >>   ============ ======== ======= ===========
> > > > >>   wildcard     rootglob cwdglob anyglob
> > > > >>   regexp       rootre   cwdre   anyre
> > > > >>   raw string   rootpath cwdpath anypath
> > > > >>   ============ ======== ======= ===========
> > > > >>
> > > > >
> > > > > Moving toward a more regular and clear feature set and naming
> seems a
> > > win.
> > > > > I'm +1 for moving in that direction.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > --
> > > > > Pierre-Yves David
> > > > >
> > > > [2  <text/html; UTF-8 (quoted-printable)>]
> > > >
> > >
> > > ----------------------------------------------------------------------
> > > [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp
> > >
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161107/81264ff5/attachment.html>


More information about the Mercurial-devel mailing list