[PATCH] match: adding non-recursive directory matching

Rodrigo Damazio rdamazio at google.com
Tue Jan 24 01:02:03 UTC 2017


Getting back to this after the end-of-year hiatus (yes, I know it happens
to be during another code freeze :) I seem to have good timing).

On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>
>>     If I got these two pieces right, it looks like we could just apply
>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>>     usecase filled while not jumping into UI changes. Would that work
>>     for you ?
>>
>>
>> Not without a third set of changes, since set expansion doesn't use
>> visitdir (or the matcher being built) at all - the dependency is that
>> building the matcher depends on expanding the set (and thus the set
>> can't depend on the matcher).
>> It would technically be doable for re:, but I'm wary of getting into the
>> business of parsing and special-casing regexes to assume what they match
>> or don't.
>>
>
> Rodrigo and I chatted directly about this a couple of days ago. Here is a
> quick summary of my new understanding of the situation.
>
> Fileset
> -------
>
> Fileset (behind "set:") can give the right result, but it is powered by
> not very modern code, it follow the old revset principle of "get everything
> and then run filters on that everything". That does not fit Rodrigo needs
> at all. It was easy to make 'set:' a bit smarter in the simple case but
> then we get into the issue that the matcher class is using 'set:' in a
> strange, non-lazy, way that does not use all the 'visitdir' feature
> Rodrigo/Google needs.
>
> So in short, fileset needs a rework before being usable in a demanding
> context.
>
>
> Current path restriction capability
> -----------------------------------
>
> The 'Match' class already have logic to restrict the path visited
> (implemented in the 'visitdir' method). To clarify, this logic as no effect
> on the returned match but is only an optimization for the directory we
> visit. It seems to only kicks in when treemanifest is used.
> This logic already works with a couple of patterns type (all pattern use
> the same class). However, that logic currently do not support the case were
> one want to select some subdirectory and skips the rest of the subtrees
> under it.
>

That is correct.

note: Rodrigo, you seems to have a good understanding of the logic. Do you
> think you could document the involved attributes (_includeroots,
> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
> looking at the code.
>

Sure. It took me a while to understand that "roots" means "recursive
directories" and "dirs" means "non-recursive directories" in that code - it
all became much more clear after that. I'll be sure to add comments in my
patch and/or rename the attributes.


>
> Way forward
> -----------
>
> That limitation in the matcher class optimization is the main blocker for
> Rodrigo/Google needs. The optimization is independent of the UI part we
> actually provides to user as all patterns use the same matcher class and
> some existing class could already benefit from this optimization.
>
> Rodrigo seems to have a patch to update the matcher code to track and
> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
> yet. Submitting that patches seems the next step to me. It will get the
> matcher code in a state that can actually be used for the
> narrowhg+treemanifest usecase.
>
> Once that code is in, it seems easy to make sure various patterns use it
> basic, easily recognizable cases. We poked at updating the code to have
> basic regexp matching a subtree recognized as such and that was quite easy.
>
>
> Rodrigo, does that match your current understanding of the situation?
>

It does.
And just to clarify on the patches - I sent an initial patch, then after
comments changes it significantly, so those are two different changes:

   - The first implements a "files:" matcher which matches all files inside
   a directory, non-recursively. This has no wildcards, so special-casing it
   in visitdir and any other places needed results in clean and simple code
   ("if it's files:, don't recurse").
   - The second implements "rootglob:" which allows any number of wildcards
   at point in the path, and is part of Foozy's plan for the new set of
   matchers. This adds some complexity in splitting dirs and roots (mentioned
   above) by having to parse the wildcards, and then the visitdir change looks
   less clean ("if it's a rootglob that has a single /* wildcard at the end,
   then don't recurse" - other cases are possible but start to get more
   complex).

For these reasons, I'd still prefer to get "files:" or similar in, but I'm
open for doing it either way. Please advise on the preferred way and I'll
send an updated patch (2 patches really - one for the matcher, one for the
visitdir optimization which makes it work with narrow).

Thanks
Rodrigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170123/1e05a99a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4847 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170123/1e05a99a/attachment.bin>


More information about the Mercurial-devel mailing list