File Name Patterns Plan

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Fri Dec 2 12:43:53 EST 2016


At Thu, 24 Nov 2016 17:04:38 +0100,
Pierre-Yves David wrote:
> 
> Recently, Foozy created a Plan page for the matcher issues:
> 
> https://www.mercurial-scm.org/wiki/FileNamePatternsPlan
> 
> It is a good start but there is a couple of elements that are
> missing or still a bit fuzzy to me.
> 
> 
> 1) Default matcher:
> 
>    What is the default pattern mode ?
> 
>    When one do `hg files FOO` how is FOO processed. It seems like to be
>    'relpath:'. Double checking this would be useful and mentioning it
>    on the page is important.
> 
> 2) Difference in command behavior:
> 
>    There seems to be some commands behaving differently than other,
>    notably `hg locates` have some strange kind of
>    raw-non-recursive-any-rooted matching by default. It seems to go back to
>    'relpath:' when using -I
> 
>    I wonder if there is other commands like this. It might be useful to
>    search for the default matcher on a command/flag basis.
> 
> 3) Recursion behavior,
> 
>     There is some data about this in the page, but I think we need more
>     formal representation to have a clear view of the situation.
> 
>     The existing 'path:' and 'relpath:' are recursive in all cases,
>     while 'glob:' and 're:' variants are only recursive with -I/-E.
>     This is a key point because as far as I understand fixing this is a
>     core goal of the current plan.
> 
>     However, Foozy point out that using 'set:' with -I disable the
>     automatic recursion for 're' and 'glob', but not for 'path', so we
>     have more "variants" here.
> 
>     (bonus point: Rodrigo use case can we fulfilled by adding 'set:' to
>     his selector.)
> 
>     I also wonder if there is other variants than "pattern", "-I" and
>     "-I + set:".
> 
>     Having a table with 'pattern-type / usage' listing the recursive
>     case would probably be a good start.
> 
> 4) Reading from file,
> 
>    Foozy mention the pattern name in some file (hgignore) does not
>    match pattern name on the command line.
> 
>    I think it would be useful to be a bit more formal here. What kind
>    of file do we read pattern from? Do we have difference from 1 file
>    to another? what are the translation (and default), etc.
>
> 5) Pattern-type table
> 
>    Foozy made many table explaining how variants are covered by
>    pattern type. Having a pattern centric summary will be useful.
> 
>    Proposal for columns:
> 
>    * pattern type;
>    * from cli or file;
>    * matching mode (raw, glob, or re),
>    * rooting (root, cwd or any),
>    * recursive when used as Pattern
>    * recursive when used with -I
> 
>    Having the same table for the proposed keyword would help to
>    understand inconsistency and similarity with

For (1) - (5) above, I just revised "Current status" of
FileNamePatternsPlan wiki page like as below.

=========================================================
Summary of mode, relative-to, and recursion of each types
=========================================================

  ============= ======= ======== =========== ========== =========
  mode          root-ed cwd-ed   any-of-path control    context
                                             recursion  depend
                                             by pattern recursion
  ============= ======= ======== =========== ========== =========
  wildcard      ---     glob:    relglob:    by **      o
  regexp        re:     ---      relre:      by $       x (*A)
  raw string    path:   relpath: ---         (always)   x
  ============= ======= ======== =========== ========== =========

(*A) "regexp" mode ignore pattern matches recursively (e.g. 're:^foo$'
     ignores file 'foo/bar'). Detail is explained later.

===================================================
The list of contexts, in which pattern is specified
===================================================

  ========================= ============ =========== ===============
  pattern for               default type recursion   related API
                                         of wildcard
  ========================= ============ =========== ===============
  fileset                   glob:         x          ctx.match()
  files() template function glob:         x          ctx.match()
  diff() template function  glob:         o (*1)     ctx.match()

  file() revset predicate   glob:         x          match.match()
  follow() revset predicate path:         x          match.match()
  --include/--exclude       glob:         o (*1)     match.match()
  hgignore                  relre:        o (*1)     match.match()

  'archive' web command     path:         - (*2)     scmutil.match()
  'hg locate'               relglob:      x          scmutil.match()
  'hg log'                  relpath:      x          scmutil.matchandpats()
  others (e.g. 'hg files')  relpath:      x          scmutil.match()
  ========================= ============ =========== ===============

(*1) treated as 'include'/'exclude' of match.match()
     (otherwise, treated as 'pats' of match.match())

(*2) no "wildcard" pattern matching occurs for 'archive' web command,
     becuase 'path:' is forcibly added to specified pattern in this case

For "recursion of wildcard":

  - if "recursive of wildcard", pattern 'glob:foo/bar' matches against
    file 'foo/bar/baz', for example

  - inner context is used to decide "recursion of wildcard", if
    multiple contexts are combined

For example, file 'foo/bar/baz' is:

  - not matched at: hg files glob:foo/bar
  - not matched at: hg files -I "set:'glob:foo/bar'"
  - but matched at: hg files -I glob:foo/bar

The last case seems to cause the issue originally mentioned by
Rodrigo. And the second case can be used as instant work around for
that issue.

"recursion of wildcard" of the pattern from a file follows one of what
tries to read that file in. For example:

  - wildcard pattern read in by '-I listfile:FILE' matches recursively, but
  - one read in by 'hg status listfile:FILE' doesn't

==========================
Reading patterns from file
==========================

  =============== ============ ============ =============
                  type         default type default type
                  substitution for hgignore for otherwise
  =============== ============ ============ =============
  include:FILE    o            relre:       relre:
  listfile:FILE   x            (*X)         (*Y)
  =============== ============ ============ =============

(*X) this is prohibited by match.readpatternfile()

(*Y) decision about "default type" depends on the context, in which
     'listfile:FILE' is used (e.g. 'relglob:' for 'hg locate', but
     'relpath:' for 'hg files').

If "type substitution", substitutions below occur always at reading
patterns from file. This is mentioned in 'hg help patterns' and 'hg
help hgignore', but type 'relglob:' and 'relre:' themselves aren't
explained.

  - glob: => relglob:
  - re:   => relre:

Reading from '.hgignore' and '[ui] ignore' is treated as a variant of
'include:' internally (e.g. 'include:$REPOROOT/.hgignore')

============================
Recursion of ignore patterns
============================

As a ignore pattern, "wildcard" and "raw string" modes are obviously
recursive, because:

  - treating as same as '--include PATTERN' makes "wildcard" mode recursive
  - "raw string" mode is always recursive, regardless of context

On the other hand, "regexp" mode itself is non-recursive. For example,
with 're:^foo$' in .hgignore, 'hg debugignore' shows the regexp, which
doesn't match against file 'foo/bar'.

But actually, 're:^foo$' in .hgignore ignores file 'foo/bar', because
dirstate (and 'hg debugignore') examines whether specified file does:

  - match against specified ignore patterns, or
  - exist under the directory, which matches against specified ignore patterns

and that file is ignored, if one of conditions above is true.

Therefore, "regexp" ignore pattern is recursive, even if it uses '$'.

In conclusion, all ignore patterns are treated as recursive,
regardless of pattern types.

This special recursion of "regexp" mode is specific for ignore
patterns. In other cases, "regexp" mode pattern isn't recursive, if it
uses '$'.


> 6) file:/dir:
> 
>    I'm a bit confused here because Mercurial does not really track/work
>    on directories. What is is benefit of 'dir:' ? 'dir:' seems very
>    similar to 'path' am I missing something important?
> 
>    As I understand 'file:' could be useful for the non-recursive
>    part if we want to cover every single cases. Am I right?
> 
> 7) compatibility conclusion
> 
>    Getting a whole new set of matcher is a big step that have a
>    significant confusion step, we have to get it right
> 
>    We cannot change the default behavior (raw string) and this is what
>    people will find the most. So we have to be careful about
>    inconsistency here because we cannot change the behavior of this
>    current default. For example it is probably better that all the new
>    matcher very consistent with each other and that the behavior
>    mismatch between raw and the new official one is simple to grasp.
> 
>    In the same way, I do not think we'll be able to alias the old
>    pattern-type to the new-ones. Because we cannot fix recursion
>    behavior of the old ones.
>    There will be online material with the old one and we won't be able
>    to fix them. This is a lesser issue but we should probably keep it
>    in mind. (Without any serious backing I expect that pattern for
>    hgignore are probably the most documented online).
> 
> Cheers,
> 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list