[PATCH 2 of 2 V2] filterlang: add a small language to filter files

Yuya Nishihara yuya at tcha.org
Fri Jan 12 07:00:33 EST 2018


On Thu, 11 Jan 2018 11:13:57 -0500, Matt Harbison wrote:
> 
> > On Jan 11, 2018, at 10:16 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> > 
> >> On Thu, 11 Jan 2018 00:17:39 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison at yahoo.com>
> >> # Date 1515641014 18000
> >> #      Wed Jan 10 22:23:34 2018 -0500
> >> # Node ID 548e748cb3f4eea0aedb36a2b2e9fe3b77ffb263
> >> # Parent  962b2bdd70d094ce4bf9a8135495788166b04510
> >> filterlang: add a small language to filter files
> > 
> >> I also made the 'always' token a
> >> predicate for consistency, and introduced 'never' to improve readability.
> > 
> > Perhaps '**' or '.' could be an "always" symbol given patterns are relative
> > to the repository root in filterlang.
> 
> I’m thinking ahead to a tracked file that could be converted to this language, and trying to make it readable. This construct seems weird to me:
> 
>   **.c = !**

Ah, okay. always()/never() or all()/none() makes sense there. I slightly
prefer all()/none() as fileset is the language for set operations, and we
have all() in revset.

> >> diff --git a/mercurial/filterlang.py b/mercurial/filterlang.py
> >> new file mode 100644
> >> --- /dev/null
> >> +++ b/mercurial/filterlang.py
> >> @@ -0,0 +1,73 @@
> >> +# filterlang.py - a simple language to select files
> > 
> > The module name seems too generic.
> > minifileset.py, ufileset.py, etc. or merge these functions into fileset.py?
> 
> minifileset.py I guess?  My concern with putting it in fileset.py is how to enforce the boundary clearly.

Seems fine.

> >> +def _compile(tree):
> >> +    op = tree[0]
> >> +    if op in ('symbol', 'string'):
> >> +        name = fileset.getstring(tree, 'invalid file pattern')
> >> +        op = name[0]
> >> +        if op == '*': # file extension test, ex. "*.tar.gz"
> >> +            return lambda n, s: n.endswith(name[1:])
> > 
> > Better to make sure no metacharacters in name[1:].
> 
> Aren’t meta characters allowed in a string, so as to not block certain file names?  Does this mean symbol and string have to be handled separately?

I meant '*.*' shouldn't be translated to n.endswith('.*'), for example.

> >> +    elif op in ['or', 'and']:
> >> +        funcs = [_compile(t) for t in tree[1:]]
> >> +        summary = {'or': any, 'and': all}[op]
> >> +        return lambda n, s: summary(f(n, s) for f in funcs)
> > 
> > IIRC, ('or'/'and', x, y) isn't flattened in fileset.py, so the tree would have
> > exactly 2 operands.
> 
> fileset.andset() calls getset(), which checks the arg, but maybe that’s an artifact of other uses.

That's probably for "()", an empty group.

Here I meant any()/all() always takes a list of two elements. Just for the
record, that isn't a problem.


More information about the Mercurial-devel mailing list