[PATCH] match: adding non-recursive directory matching

Martin von Zweigbergk martinvonz at google.com
Thu Jan 26 14:24:20 EST 2017


On Thu, Jan 26, 2017 at 11:19 AM, FUJIWARA Katsunori
<foozy at lares.dti.ne.jp> wrote:
>
> At Wed, 25 Jan 2017 20:54:37 -0800,
> Martin von Zweigbergk wrote:
>>
>> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
>> <mercurial-devel at mercurial-scm.org> wrote:
>> > Getting back to this after the end-of-year hiatus (yes, I know it happens to
>> > be during another code freeze :) I seem to have good timing).
>> >
>> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
>> > <pierre-yves.david at ens-lyon.org> wrote:
>> >>
>> >>
>> >>
>> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>> >>>
>> >>>     If I got these two pieces right, it looks like we could just apply
>> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>> >>>     usecase filled while not jumping into UI changes. Would that work
>> >>>     for you ?
>> >>>
>> >>>
>> >>> Not without a third set of changes, since set expansion doesn't use
>> >>> visitdir (or the matcher being built) at all - the dependency is that
>> >>> building the matcher depends on expanding the set (and thus the set
>> >>> can't depend on the matcher).
>> >>> It would technically be doable for re:, but I'm wary of getting into the
>> >>> business of parsing and special-casing regexes to assume what they match
>> >>> or don't.
>> >>
>> >>
>> >> Rodrigo and I chatted directly about this a couple of days ago. Here is a
>> >> quick summary of my new understanding of the situation.
>> >>
>> >> Fileset
>> >> -------
>> >>
>> >> Fileset (behind "set:") can give the right result, but it is powered by
>> >> not very modern code, it follow the old revset principle of "get everything
>> >> and then run filters on that everything". That does not fit Rodrigo needs at
>> >> all. It was easy to make 'set:' a bit smarter in the simple case but then we
>> >> get into the issue that the matcher class is using 'set:' in a strange,
>> >> non-lazy, way that does not use all the 'visitdir' feature Rodrigo/Google
>> >> needs.
>> >>
>> >> So in short, fileset needs a rework before being usable in a demanding
>> >> context.
>> >>
>> >>
>> >> Current path restriction capability
>> >> -----------------------------------
>> >>
>> >> The 'Match' class already have logic to restrict the path visited
>> >> (implemented in the 'visitdir' method). To clarify, this logic as no effect
>> >> on the returned match but is only an optimization for the directory we
>> >> visit. It seems to only kicks in when treemanifest is used.
>> >> This logic already works with a couple of patterns type (all pattern use
>> >> the same class). However, that logic currently do not support the case were
>> >> one want to select some subdirectory and skips the rest of the subtrees
>> >> under it.
>> >
>> >
>> > That is correct.
>> >
>> >> note: Rodrigo, you seems to have a good understanding of the logic. Do you
>> >> think you could document the involved attributes (_includeroots,
>> >> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
>> >> looking at the code.
>> >
>> >
>> > Sure. It took me a while to understand that "roots" means "recursive
>> > directories" and "dirs" means "non-recursive directories" in that code - it
>> > all became much more clear after that. I'll be sure to add comments in my
>> > patch and/or rename the attributes.
>> >
>> >>
>> >>
>> >> Way forward
>> >> -----------
>> >>
>> >> That limitation in the matcher class optimization is the main blocker for
>> >> Rodrigo/Google needs. The optimization is independent of the UI part we
>> >> actually provides to user as all patterns use the same matcher class and
>> >> some existing class could already benefit from this optimization.
>> >>
>> >> Rodrigo seems to have a patch to update the matcher code to track and
>> >> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
>> >> yet. Submitting that patches seems the next step to me. It will get the
>> >> matcher code in a state that can actually be used for the
>> >> narrowhg+treemanifest usecase.
>> >>
>> >> Once that code is in, it seems easy to make sure various patterns use it
>> >> basic, easily recognizable cases. We poked at updating the code to have
>> >> basic regexp matching a subtree recognized as such and that was quite easy.
>> >>
>> >>
>> >> Rodrigo, does that match your current understanding of the situation?
>> >
>> >
>> > It does.
>> > And just to clarify on the patches - I sent an initial patch, then after
>> > comments changes it significantly, so those are two different changes:
>> >
>> > The first implements a "files:" matcher which matches all files inside a
>> > directory, non-recursively. This has no wildcards, so special-casing it in
>> > visitdir and any other places needed results in clean and simple code ("if
>> > it's files:, don't recurse").
>> > The second implements "rootglob:" which allows any number of wildcards at
>> > point in the path, and is part of Foozy's plan for the new set of matchers.
>> > This adds some complexity in splitting dirs and roots (mentioned above) by
>> > having to parse the wildcards, and then the visitdir change looks less clean
>> > ("if it's a rootglob that has a single /* wildcard at the end, then don't
>> > recurse" - other cases are possible but start to get more complex).
>> >
>> > For these reasons, I'd still prefer to get "files:" or similar in, but I'm
>> > open for doing it either way. Please advise on the preferred way and I'll
>> > send an updated patch (2 patches really - one for the matcher, one for the
>> > visitdir optimization which makes it work with narrow).
>>
>> I'm fine with not doing rootglob:, but if I read foozy's proposal
>> right, the proposed files: will be what he would call rootfiles:. I
>> liked his proposal for a systematic naming, and if I got that right, I
>> think we should call it that from the beginning so we don't end up
>> with more aliases.
>
> Yeah, we should avoid confusion of naming !
>
>> I'd also like "rootfiles:foo" to *not* match the
>> file "foo", but only files in the directory "foo/". I mention that
>> because last I heard, he was unsure about that himself. Foozy, do you
>> agree?
>
> I don't have strong opinion against mode "XXX:", which matches against
> both "just this file" and "files directly under this directory"
>
> Therefore, I agree with adding new mode "XXX:", if it is needed (and
> Rodrigo/Google think so).
>
> But, name "files:" doesn't seem to suit for "XXX:" mode (at least, for
> me).
>
> Even if it matches against only "files directly under this directory",
> "files:" doesn't yet.
>
> Maybe, root cause of my bad feel is that "foo" of "files:foo" should
> be the directory in both cases, even though mode name "files:" has
> less "(under) this directory" flavor.
>
> If it is possible to combine 2 modes below for solving issues of
> Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
> "YYY:" should be easier than naming "XXX:".
>
>   - "file:" matching against "just this file"
>   - "YYY:" matching against "files directly under this directory"
>
> Are there any better (and short enough) names for "XXX:" or "YYY:"
> than "files:" ?
>
>   - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
>   - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?
>
> Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to go
> forward :-)

I like "file:" and "filesin:" for those two cases. But we should add
the "root" prefix so we don't have to do that later, right?

>
>>
>> >
>> > Thanks
>> > Rodrigo
>> >
>> >
>> > _______________________________________________
>> > Mercurial-devel mailing list
>> > Mercurial-devel at mercurial-scm.org
>> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>> >
>>
>
> --
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list