[PATCH] match: adding non-recursive directory matching

Rodrigo Damazio rdamazio at google.com
Tue Dec 20 22:21:07 EST 2016


On Tue, Dec 20, 2016 at 5:47 AM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 12/20/2016 06:00 AM, Rodrigo Damazio wrote:
>
>> Unfortunately, while set would match the right files, because of the way
>> the code is structured, it provides no way to not try visiting the
>> directories inside the non-recursive match - the set needs to first
>> collect all the files in all subdirectories (match.py, _expandset) and
>> then filter that down to the desired ones. In plain hg repos, that's
>> just much slower - in the context of narrowhg, the repo will simply not
>> have the manifests for those subdirectories and trying to visit them
>> will crash.
>>
>
> Okay, so this seems like the current tools allow you to specify the right
> request but shortcoming of the -implementation- are preventing that request
> to work probably with narrowhg (and have performance impacts)
>
> Did I got that right ?


Yes.

The follow-up change to this one (which I haven't sent yet but is
>> written) is updating visitdir to allow non-recursiveness, which btw
>> makes something like "hg files -I rootglob:browser/*" about 4-5x faster
>> in the firefox repo.
>>
>
> And, If I read you right, the implementation of 'rootglob:' you provided
> in your patch have the same implementation issue, but you have another
> patch to improve the implementation to behave a way you can use (and is
> faster).
>
> Did I got that right too ?
>

Yes.

If I got these two pieces right, it looks like we could just apply the
> improvement to 'visitdir' to 'set:your/glob/*' and have your usecase filled
> while not jumping into UI changes. Would that work for you ?
>

Not without a third set of changes, since set expansion doesn't use
visitdir (or the matcher being built) at all - the dependency is that
building the matcher depends on expanding the set (and thus the set can't
depend on the matcher).
It would technically be doable for re:, but I'm wary of getting into the
business of parsing and special-casing regexes to assume what they match or
don't.


> On Fri, Dec 16, 2016 at 6:21 AM, Pierre-Yves David
>> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
>> wrote:
>>
>>
>>
>>     On 12/16/2016 02:19 AM, Augie Fackler wrote:
>>
>>
>>             On Nov 24, 2016, at 10:28 AM, FUJIWARA Katsunori
>>             <foozy at lares.dti.ne.jp <mailto:foozy at lares.dti.ne.jp>> wrote:
>>
>>                     Yes, "files:" was the original version of this patch
>>                     and the case I really
>>                     care about :) I changed it to rootglob after your
>>                     comments.
>>                     Which way would be preferred to move forward?
>>
>>
>>             "files:" is "path:" family, and "rootglob:" is "glob:"
>>             family. As we
>>             concluded before, "path:" itself can't control recursion of
>>             matching
>>             well.
>>
>>             Therefore, I think that "files:" should be implemented if
>>             needed,
>>             regardless of implementing "rootglob:".
>>
>>             Of course, we need high point view of this area, at first :-)
>>
>>
>>             BTW, it is a little ambiguous (at least, for me) that
>>             "files:foo"
>>             matches against both file "foo" and files just under directory
>>             "foo". Name other than "files:" may resolve this ambiguity,
>>             but I
>>             don't have any better (and short enough) name :-<
>>
>>              ========== ==== ======= ===========
>>              pattern    foo  foo/bar foo/bar/baz
>>              ========== ==== ======= ===========
>>              path:foo    o     o         o
>>
>>              files:foo   o     o         x
>>
>>              file:foo    o     x         x
>>              dir:foo     x     o         o
>>              ========== ==== ======= ===========
>>
>>
>>         Scanning the plan page, I see that there’s a *lot* of work that
>>         could be done and no consensus as yet, but that the only
>>         immediate use case seems to be the rootfile/rootglob case. Is
>>         there some path forward we could agree on that would unblock
>>         those immediate needs for narrowhg and not make things harder in
>>         the future?
>>
>>         Alternatively, would we be okay with a slight refactor of the
>>         matcher so that narrowhg can introduce a custom filesonly:
>>         matcher for the time being so we can keep making forward
>>         progress there?  I don’t know the matcher code well enough to be
>>         able to guess if this is a reasonable path so we can be unblocked.
>>
>>         (It’s very hard for to justify the amount of work implied by
>>         reaching consensus on FileNamePatternsPlan and then executing
>>         the entire thing when what we need is solvable today with a
>>         sub-hour patch to existing code, thus my trying to find a
>>         solution we can all live with.)
>>
>>
>>     As far as I understand, Foozy finding shows that the feature
>>     narrowhg needs is already there and nothing new is necessary.
>>
>>     You can add "set:" in front of your glob to make it non recursive in
>>     all cases "set:your/directory/you/want/to/match/files/in/*"
>>
>>     If this does not fits your needs, this probably mean I got your
>>     usecase wrong. In that case can you re-explain the issue you are
>>     trying to solve here?
>>
>>
>>     At the project level, it will make sense to clean up the Pattern
>>     Matching at some point, and Foozy wiki work will help us to do that.
>>
>>     Cheers.
>>
>>     --
>>     Pierre-Yves David
>>
>>
>>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161220/18051a45/attachment.html>


More information about the Mercurial-devel mailing list