[PATCH] revset: introduce the summary predicate

Wed Jan 11 09:15:28 EST 2017

On Tue, 10 Jan 2017 20:09:45 -0500, Matt Harbison wrote:
> On Mon, 09 Jan 2017 23:33:17 -0500, Matt Harbison <mharbison72 at gmail.com>  
> wrote:
> > On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David  
> >> * predicate
> >> * default behavior
> >> * support 'rich' stringmatcher ?
> >> * are 'literal:' case sensitive ?
> >> * are 're:' case sensitive (and supported at all) ?
> >
> > TL;DR: desc, grep, keyword, and author/user are the oddballs.  grep and  
> > keyword are the regex and literal halves respectively, of the same  
> > search.
> >
> > After stripping out non string predicates, we basically end up with 4  
> > groups:
> >
> > - util.stringmatcher based:
> >
> >      Predicate:            Case Sensitive?
> >      "author"                   N
> >      "user"                     N
> >      "bookmark"                 Y
> >      "branch"                   Y
> >      "extra"                    Y
> >      "named"                    Y
> >      "subrepo"  [1]             Y
> >      "tag"                      Y
> >
> > These all support 'literal:' and 're:'.  Case sensitivity applies the  
> > same to both prefixes, and raw pattern.
> >
> > [1] Not documented to support 'literal:' or 're:' prefixes.
> >
> >
> > - Local method implementation based:
> >
> >      Predicate:            Case Sensitive?
> >      "desc"                     N
> >      "grep"                     Y
> >      "keyword"                  N
> >
> > None of these support prefixes.  The grep param is a regex without  
> > 're:', so it doesn't make a lot of sense to support stringmatcher here-  
> > what stringmatcher thinks is literal is really regex.  If we internally  
> > bolt on 're:', it still can't support literal matches.
> >
> >
> > - match.py based (not relevant, but for completeness):
> >
> >      "adds", "contains", "file", "filelog", "follow",
> >      "followlines", "modifies", "removes"
> >
> > - "bisect" (I can't see how 're:' support here would be meaningful.)
> >
> >
> >>  From there we'll be able to see if a pattern emerge and pick the best  
> >> way to move forward.

Thanks for the nice summary. Inline comments follow.

> > The following thoughts come to mind:
> >
> > A) author/user has been thoroughly corrupted with the lower casing.   
> > Maybe come up with a 3rd one that follows modern rules, and  
> > deprecate/hide these?  Sad, because these seem natural.  OTOH, having  
> > raw pattern and prefixed patterns behave the same everywhere is elegant  
> > and simple.  Not sure what to call it.  'committer' comes to mind, but  
> > that has git implications.  That was suggested when I proposed recording  
> > who performed a graft in the 'extra' dict.  Then it was pointed out this  
> > tracking was related to the chain of custody proposal by Greg, so I let  
> > it drop [1].  My enthusiasm was dampened some when I saw templates also  
> > have {author} and {user}, though the latter is a filter, not a reference  
> > to the field.
> >
> > [1]  
> > https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-April/068692.html

Yeah, that would be unfortunate to deprecate author/user() since they are very
common terms. Also, adding case-sensitive author() would not provide much user
benefit even though that could help eliminating the inconsistency.

So I'm -1 for (A).

On the other hand, I think we should fix the re: case to not lowercase the
pattern but to perform case-insensitive matching so '\B' won't be '\b'.

> > B) We should maybe fold grep and keyword into a new predicate  
> > ('search'?) that follows modern formatting, and deprecate/hide these  
> > two.  Simple, and drops the visible revset count by 1.  I'm not sure if  
> > 'grep' was borrowed  from git, or if it's just inspired by the unix  
> > command.
> 
> I had second thoughts about this.  I think if we just add documentation to  
> 'keyword' that says "If you want to search these fields by regex or case  
> sensitively, use 'grep'".  Then the user can see how to get everything  
> stringmatcher will provide, and it makes it clear to developers that  
> 'keyword' doesn't need string stringmatcher support.  I'll submit a patch  
> during the freeze.

The documentation thing sounds nice. I also like the idea of deprecating
grep() since grep() sounds like searching the file contents.

> That leaves only 'author' and 'desc' as not providing the full  
> functionality of the others.
> 
> > C) If we do A + B, that means 'desc' is the only oddball left.  I don't  
> > like the idea that case sensitivity for a raw pattern and a 'literal:'  
> > prefixed pattern would differ.  They are both literals in my mind, and  
> > it would be the one remaining exception.  The 're:' prefix could follow  
> > regular rules.

I won't insist that 'literal:' must be case-sensitive (because of (A).)
However, I would guess 're:' is also case-insensitive if 'literal:' is.
In my mindset, desc() is a case-insensitive matcher in that case.

So I lean towards adding case-insensitive desc('re:'), which would be at least
consistent in that desc() always ignores cases.

> > D) I guess we could hide/replace 'desc' with something new as well.  
> > ('message'?)  But I'm wondering about the BC rules.  I think generally,  
> > we don't change behavior without the user doing some action to opt in.   
> > But the 're:' prefix was added after many of these predicates existed  
> > for years.  Obviously, that broke anybody's search that started with  
> > 're:', just by them upgrading Mercurial.  If making them change the  
> > query to prefix 'literal:' was acceptable, why would changing the case  
> > sensitivity not be, assuming we provide 'icase-literal:'?  Is it just a  
> > general feeling that searches for 're:' are rare?  I understand the  
> > principle and motivation, but sometimes it's hard to figure out how they  
> > are applied.

My intuition about the BC rule is that we can slightly change the behavior if

 a) vast majority would be unaffected (e.g. 're:' is rare)
 b) there's a good way to get along with the BC (e.g. using 'literal:')

I bet some people would rely on the current case-insensitive desc(), so
I don't think we can change the default.

> >> I've missed that '{firstline}' proposal from Sean, can you point me at  
> >> it? (or summarize it ?)
> >
> > Not a proposal, so much as historical knowledge about the template?
> >
> > https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-January/092099.html

I thought that the proposal was something like
search_in_template_output('re:issue\d+', '{desc|firstline'}').