[PATCH] revset: introduce the summary predicate

Mon Jan 9 23:33:17 EST 2017

On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David  
<pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 01/08/2017 09:34 PM, Matt Harbison wrote:
>> On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David
>> <pierre-yves.david at ens-lyon.org> wrote:
>>
>>> (ha, I wrote my previous reply in a train and it got sent when I
>>> connected again (and received that one). I'm going to try to adress
>>> the new content in this email and sometime repeat some of my other
>>> reply content for clarity)
>>>
>>> On 01/08/2017 04:23 AM, Matt Harbison wrote:
>>>> On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara <yuya at tcha.org>
>>>> wrote:
>>>>
>>>>> On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:
>>>>>> > On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
>>>>>> <pierre-yves.david at ens-lyon.org> wrote:
>>>>>> >> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>>>>>> >> # HG changeset patch
>>>>>> >> # User Matt Harbison <matt_harbison at yahoo.com>
>>>>>> >> # Date 1483550016 18000
>>>>>> >> #      Wed Jan 04 12:13:36 2017 -0500
>>>>>> >> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>>>>>> >> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>>>>>> >> revset: introduce the summary predicate
>>>>>> >>
>>
>>>>> Perhaps stringmatcher can have 3 types, icase literal, literal, and
>>>>> re, and
>>>>> the default of desc() is icase literal for backward compatibility.  
>>>>> You
>>>>> can
>>>>> build a case-insensitive regexp object from a literal pattern.
>>>>>
>>>>> https://docs.python.org/2/library/re.html#re.I
>>>>
>>>> Yep, that's the API I was thinking of.
>>>>
>>>> I'm confused by the rest of your comments.  When I first skimmed your
>>>> message, adding support for 'icasere:' using this API popped into my
>>>> mind.  And that could support a case insensitive literal, because
>>>> 'icasere:foo' should be equivalent to looking for the substring 'foo'
>>>> (leaving aside efficiency, how discoverable that is, and that
>>>> stringmatcher matches the whole string for literals).  But you seem to
>>>> be suggesting adding 'icaseliteral:'.
>>>
>>> I'm not 100% sure of what Yuya actually has in mind but here is my
>>> understanding of the situation and how we could move forward.
>>>
>>> Currently:
>>> ----------
>>>
>>>    desc(X) → X is customly matched as a case insensitive litteral,
>>>
>>>    We have a "generic" pattern definition syntax used by various other
>>> reveset (implemented in "stringmatcher")
>>>
>>>      foo(X)
>>>        → X is matched as a case sensitive litteral
>>>      foo('literal:X')
>>>        → X is matched as a case sensitive literal (same as the above)
>>>      food('re:X')
>>>        → X is matched as a regular expression (case sensitive)
>>>
>>> Proposal: (might be what yuya says)
>>> ---------
>>>
>>> extend the string matcher to
>>>
>>>    foo('literal:X')
>>>      → X is matched as a case sensitive literal
>>
>> See the comment in the new patch I sent about 'user()' already
>> lowercasing 'literal:' and 're:'.  I'd consider it a bug, but it's been
>> in since mid 2012.  Attempting to channel Matt, I'm guessing we are
>> stuck with that since it is so old, but wanted to see what others think.
>
> 1) Yep, we are stuck with whatever existing behavior we have for  
> existing predicate because of BC. (but we can augment it)
>
> 2) Congratulation you seems to have unearthed an area where we have many  
> predicated with close but slightly different behavior. At that point  
> I'll ask you an inventory of what we currently have so that we can  
> devise a sound and as consistent as possible way forward.
>
>    Can you provide us with a table that at least keep track of:
>
> * predicate
> * default behavior
> * support 'rich' stringmatcher ?
> * are 'literal:' case sensitive ?
> * are 're:' case sensitive (and supported at all) ?

TL;DR: desc, grep, keyword, and author/user are the oddballs.  grep and  
keyword are the regex and literal halves respectively, of the same search.

After stripping out non string predicates, we basically end up with 4  
groups:

- util.stringmatcher based:

     Predicate:            Case Sensitive?
     "author"                   N
     "user"                     N
     "bookmark"                 Y
     "branch"                   Y
     "extra"                    Y
     "named"                    Y
     "subrepo"  [1]             Y
     "tag"                      Y

These all support 'literal:' and 're:'.  Case sensitivity applies the same  
to both prefixes, and raw pattern.

[1] Not documented to support 'literal:' or 're:' prefixes.

- Local method implementation based:

     Predicate:            Case Sensitive?
     "desc"                     N
     "grep"                     Y
     "keyword"                  N

None of these support prefixes.  The grep param is a regex without 're:',  
so it doesn't make a lot of sense to support stringmatcher here- what  
stringmatcher thinks is literal is really regex.  If we internally bolt on  
're:', it still can't support literal matches.

- match.py based (not relevant, but for completeness):

     "adds", "contains", "file", "filelog", "follow",
     "followlines", "modifies", "removes"

- "bisect" (I can't see how 're:' support here would be meaningful.)

>  From there we'll be able to see if a pattern emerge and pick the best  
> way to move forward.

The following thoughts come to mind:

A) author/user has been thoroughly corrupted with the lower casing.  Maybe  
come up with a 3rd one that follows modern rules, and deprecate/hide  
these?  Sad, because these seem natural.  OTOH, having raw pattern and  
prefixed patterns behave the same everywhere is elegant and simple.  Not  
sure what to call it.  'committer' comes to mind, but that has git  
implications.  That was suggested when I proposed recording who performed  
a graft in the 'extra' dict.  Then it was pointed out this tracking was  
related to the chain of custody proposal by Greg, so I let it drop [1].   
My enthusiasm was dampened some when I saw templates also have {author}  
and {user}, though the latter is a filter, not a reference to the field.

[1]  
https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-April/068692.html

B) We should maybe fold grep and keyword into a new predicate ('search'?)  
that follows modern formatting, and deprecate/hide these two.  Simple, and  
drops the visible revset count by 1.  I'm not sure if 'grep' was borrowed  
 from git, or if it's just inspired by the unix command.

C) If we do A + B, that means 'desc' is the only oddball left.  I don't  
like the idea that case sensitivity for a raw pattern and a 'literal:'  
prefixed pattern would differ.  They are both literals in my mind, and it  
would be the one remaining exception.  The 're:' prefix could follow  
regular rules.

D) I guess we could hide/replace 'desc' with something new as well.  
('message'?)  But I'm wondering about the BC rules.  I think generally, we  
don't change behavior without the user doing some action to opt in.  But  
the 're:' prefix was added after many of these predicates existed for  
years.  Obviously, that broke anybody's search that started with 're:',  
just by them upgrading Mercurial.  If making them change the query to  
prefix 'literal:' was acceptable, why would changing the case sensitivity  
not be, assuming we provide 'icase-literal:'?  Is it just a general  
feeling that searches for 're:' are rare?  I understand the principle and  
motivation, but sometimes it's hard to figure out how they are applied.

>>>    foo('icase-literal:X')
>>>      → X is matched as a case insensitive literal
>>>    food('re:X')
>>>      → X is matched as a regular expression (case sensitive)
>>>
>>> Then, desc move to use string matcher (default to "icase-literal").
>>>
>>> We do not need a 'icase-re:' spec, because one can easily achieve it
>>> using 're:(?i)foo'
>>
>> Ah! I missed the part in the docs where flags could be set in the string
>> with (?<flag>). I thought you needed to compile with re.FLAG.  When he
>> said string literal, my mind went right to the 'literal:' prefix.
>> Agreed, no need for 'icase-re:'.
>
> Someone getting slightly confused with regular expression? Impossible!  
> ;-)
>
>>>> […]
>>>> I'm about to submit a patch to add the current 're:' support to 'desc'
>>>> in the meantime, to hopefully move this along.
>>>
>>> Great!
>>>
>>>>  I'd also be curious if
>>>> you have thoughts on how to conditionally limit this predicate to the
>>>> first line, without limiting future functionality.
>>>
>>> So having digged the regexp part a bit more, it seems like one could
>>> just use 're:^.*issue1337' to match "issue1337" on the first line ('.'
>>> does not match new-line by default).
>>
>> Thanks for looking at that.  It's way less horrible than I thought it
>> would be.  I'm curious what Sean thinks, since he mentioned {firstline}
>> being put in as a substitute for a complex regex.  I'd be fine with
>> skipping the firstline=True param if this case is mentioned in the help
>> for desc().
>
> I've missed that '{firstline}' proposal from Sean, can you point me at  
> it? (or summarize it ?)

Not a proposal, so much as historical knowledge about the template?

https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-January/092099.html

> Thanks a lot for looking into this!
>
> Cheers,