File Name Patterns Plan

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu Nov 24 14:22:25 EST 2016


At Thu, 24 Nov 2016 17:04:38 +0100,
Pierre-Yves David wrote:
> 
> Recently, Foozy created a Plan page for the matcher issues:
> 
> https://www.mercurial-scm.org/wiki/FileNamePatternsPlan
> 
> It is a good start but there is a couple of elements that are
> missing or still a bit fuzzy to me.

Thank you for comments !

I'll investigate and update Wiki page later, to prevent my sleepy
brain from incorrectly thinking :-)

This reply is just FYI about easy/clear points, to refer until
updating Wiki page, even though you may know them already.

We should open new discussion thread citing some of points, which this
mail metions, in devel-ml ASAP, to avoid scattering discussion log
here and there, shouldn't we ?


> 1) Default matcher:
> 
>    What is the default pattern mode ?
> 
>    When one do `hg files FOO` how is FOO processed. It seems like to be
>    'relpath:'. Double checking this would be useful and mentioning it
>    on the page is important.

Basically:

  =================== ======== =========
                      (default)
  case                type     recursion
  =================== ======== =========
  -I/-X               glob:       o
  "glob:" in hgignore relglob:    o (*1)
  pattern in fileset  glob:       x (*2)
  other "glob:"       glob:       x (*2)
  otherwise           relpath:    o (*2) (*3)
  =================== ======== =========

(*1) treated as "include" of match.match() internally
(*2) treated as "pats" of match.match() internally
(*3) usually, via scmutil.match() with default="relpath"

But:

> 2) Difference in command behavior:
> 
>    There seems to be some commands behaving differently than other,
>    notably `hg locates` have some strange kind of
>    raw-non-recursive-any-rooted matching by default. It seems to go back to
>    'relpath:' when using -I
> 
>    I wonder if there is other commands like this. It might be useful to
>    search for the default matcher on a command/flag basis.

Oh, I overlooked that:

  - "hg files" uses "relpath:" as default of scmutil.match(), but
  - "hg locate" uses "relglob:" explicitly

(early commits introducing "hg files" may know why)

I'll investigate.

 
> 3) Recursion behavior,
> 
>     There is some data about this in the page, but I think we need more
>     formal representation to have a clear view of the situation.
> 
>     The existing 'path:' and 'relpath:' are recursive in all cases,
>     while 'glob:' and 're:' variants are only recursive with -I/-E.
>     This is a key point because as far as I understand fixing this is a
>     core goal of the current plan.

  while 'glob:' variants are only recursive with -I/-X

  ('re:' is always recursive)

>     However, Foozy point out that using 'set:' with -I disable the
>     automatic recursion for 're' and 'glob', but not for 'path', so we
>     have more "variants" here.

  using 'set:' with -I disable the automatic recursion for 'glob', but
  not for 're' and 'path'

  ('re:' is always recursive)

 
>     (bonus point: Rodrigo use case can we fulfilled by adding 'set:' to
>     his selector.)
> 
>     I also wonder if there is other variants than "pattern", "-I" and
>     "-I + set:".
> 
>     Having a table with 'pattern-type / usage' listing the recursive
>     case would probably be a good start.

I'll investigate.


> 4) Reading from file,
> 
>    Foozy mention the pattern name in some file (hgignore) does not
>    match pattern name on the command line.
> 
>    I think it would be useful to be a bit more formal here. What kind
>    of file do we read pattern from? Do we have difference from 1 file
>    to another? what are the translation (and default), etc.

match.readpatternfile() substitutes pattern-type in files read in.

    glob => relglob
    re   => relre

    https://www.mercurial-scm.org/repo/hg/file/4.0/mercurial/match.py#l666

In Mercurial core, .hgignore (and files indirectly included by it or
hgrc) is only one case.


> 5) Pattern-type table
> 
>    Foozy made many table explaining how variants are covered by
>    pattern type. Having a pattern centric summary will be useful.
> 
>    Proposal for columns:
> 
>    * pattern type;
>    * from cli or file;
>    * matching mode (raw, glob, or re),
>    * rooting (root, cwd or any),
>    * recursive when used as Pattern
>    * recursive when used with -I
> 
>    Having the same table for the proposed keyword would help to
>    understand inconsistency and similarity with

I'll update Wiki page.


> 6) file:/dir:
> 
>    I'm a bit confused here because Mercurial does not really track/work
>    on directories. What is is benefit of 'dir:' ? 'dir:' seems very
>    similar to 'path' am I missing something important?
> 
>    As I understand 'file:' could be useful for the non-recursive
>    part if we want to cover every single cases. Am I right?

Yes, 'file:' is used for strict non-recursive matching. 'dir:' is
listed as opposite of 'file:', for coverage :-)

I have only one example usecase for "dir:". If file and directory
names collide each other at merging, all commits related not to file
FOO but files under directory FOO can be checked by:

    $ hg log -r "file('path:FOO') and not file('file:FOO')"
    $ hg log -r "file('dir:FOO')"

Theefore, I don't have strong opinion to implement 'dir:' itself.


> 7) compatibility conclusion
> 
>    Getting a whole new set of matcher is a big step that have a
>    significant confusion step, we have to get it right
> 
>    We cannot change the default behavior (raw string) and this is what
>    people will find the most. So we have to be careful about
>    inconsistency here because we cannot change the behavior of this
>    current default. For example it is probably better that all the new
>    matcher very consistent with each other and that the behavior
>    mismatch between raw and the new official one is simple to grasp.
> 
>    In the same way, I do not think we'll be able to alias the old
>    pattern-type to the new-ones. Because we cannot fix recursion
>    behavior of the old ones.
>    There will be online material with the old one and we won't be able
>    to fix them. This is a lesser issue but we should probably keep it
>    in mind. (Without any serious backing I expect that pattern for
>    hgignore are probably the most documented online).

I think that existing (= legacy) "glob:" can be implemented as an
alias of new systematic pattern-type WITH "additional suffix"
controlling recursion of matching ("relglob:" can be so, similarly)

  ================== ======== ========= ========= ==============
  case               type     recursion alias of  additional suffix
  ================== ======== ========= ========= ==============
  -I/-X              glob:       o      cwdglob:   (?:/|$)
  "glob" in hgignore relglob:    o      anyglob:   (?:/|$)
  pattern in fileset glob:       x      cwdglob:   $
  other "glob"       glob:       x      cwdglob:   $
  ================== ======== ========= ========= ==============

New systematic "*glob:" family doesn't match recursively, unless "**"
is specified at the end of pattern. Therefore, extra explanation about
recursion is needed only for "glob:" via -I/-X and hgignore.

(sorry, if I misunderstand your suggestion)


> Cheers,
> 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list