[PATCH] match: adding non-recursive directory matching

Martin von Zweigbergk martinvonz at google.com
Wed Jan 25 23:54:37 EST 2017


On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
<mercurial-devel at mercurial-scm.org> wrote:
> Getting back to this after the end-of-year hiatus (yes, I know it happens to
> be during another code freeze :) I seem to have good timing).
>
> On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org> wrote:
>>
>>
>>
>> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>>>
>>>     If I got these two pieces right, it looks like we could just apply
>>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>>>     usecase filled while not jumping into UI changes. Would that work
>>>     for you ?
>>>
>>>
>>> Not without a third set of changes, since set expansion doesn't use
>>> visitdir (or the matcher being built) at all - the dependency is that
>>> building the matcher depends on expanding the set (and thus the set
>>> can't depend on the matcher).
>>> It would technically be doable for re:, but I'm wary of getting into the
>>> business of parsing and special-casing regexes to assume what they match
>>> or don't.
>>
>>
>> Rodrigo and I chatted directly about this a couple of days ago. Here is a
>> quick summary of my new understanding of the situation.
>>
>> Fileset
>> -------
>>
>> Fileset (behind "set:") can give the right result, but it is powered by
>> not very modern code, it follow the old revset principle of "get everything
>> and then run filters on that everything". That does not fit Rodrigo needs at
>> all. It was easy to make 'set:' a bit smarter in the simple case but then we
>> get into the issue that the matcher class is using 'set:' in a strange,
>> non-lazy, way that does not use all the 'visitdir' feature Rodrigo/Google
>> needs.
>>
>> So in short, fileset needs a rework before being usable in a demanding
>> context.
>>
>>
>> Current path restriction capability
>> -----------------------------------
>>
>> The 'Match' class already have logic to restrict the path visited
>> (implemented in the 'visitdir' method). To clarify, this logic as no effect
>> on the returned match but is only an optimization for the directory we
>> visit. It seems to only kicks in when treemanifest is used.
>> This logic already works with a couple of patterns type (all pattern use
>> the same class). However, that logic currently do not support the case were
>> one want to select some subdirectory and skips the rest of the subtrees
>> under it.
>
>
> That is correct.
>
>> note: Rodrigo, you seems to have a good understanding of the logic. Do you
>> think you could document the involved attributes (_includeroots,
>> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
>> looking at the code.
>
>
> Sure. It took me a while to understand that "roots" means "recursive
> directories" and "dirs" means "non-recursive directories" in that code - it
> all became much more clear after that. I'll be sure to add comments in my
> patch and/or rename the attributes.
>
>>
>>
>> Way forward
>> -----------
>>
>> That limitation in the matcher class optimization is the main blocker for
>> Rodrigo/Google needs. The optimization is independent of the UI part we
>> actually provides to user as all patterns use the same matcher class and
>> some existing class could already benefit from this optimization.
>>
>> Rodrigo seems to have a patch to update the matcher code to track and
>> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
>> yet. Submitting that patches seems the next step to me. It will get the
>> matcher code in a state that can actually be used for the
>> narrowhg+treemanifest usecase.
>>
>> Once that code is in, it seems easy to make sure various patterns use it
>> basic, easily recognizable cases. We poked at updating the code to have
>> basic regexp matching a subtree recognized as such and that was quite easy.
>>
>>
>> Rodrigo, does that match your current understanding of the situation?
>
>
> It does.
> And just to clarify on the patches - I sent an initial patch, then after
> comments changes it significantly, so those are two different changes:
>
> The first implements a "files:" matcher which matches all files inside a
> directory, non-recursively. This has no wildcards, so special-casing it in
> visitdir and any other places needed results in clean and simple code ("if
> it's files:, don't recurse").
> The second implements "rootglob:" which allows any number of wildcards at
> point in the path, and is part of Foozy's plan for the new set of matchers.
> This adds some complexity in splitting dirs and roots (mentioned above) by
> having to parse the wildcards, and then the visitdir change looks less clean
> ("if it's a rootglob that has a single /* wildcard at the end, then don't
> recurse" - other cases are possible but start to get more complex).
>
> For these reasons, I'd still prefer to get "files:" or similar in, but I'm
> open for doing it either way. Please advise on the preferred way and I'll
> send an updated patch (2 patches really - one for the matcher, one for the
> visitdir optimization which makes it work with narrow).

I'm fine with not doing rootglob:, but if I read foozy's proposal
right, the proposed files: will be what he would call rootfiles:. I
liked his proposal for a systematic naming, and if I got that right, I
think we should call it that from the beginning so we don't end up
with more aliases. I'd also like "rootfiles:foo" to *not* match the
file "foo", but only files in the directory "foo/". I mention that
because last I heard, he was unsure about that himself. Foozy, do you
agree?


>
> Thanks
> Rodrigo
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>


More information about the Mercurial-devel mailing list