[PATCH] revset: introduce the summary predicate

Fri Jan 6 21:29:43 EST 2017

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David <pierre-yves.david at ens-lyon.org> wrote:
> 
> 
> 
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1483550016 18000
>> #      Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>> 
>> This is similar to the 'desc()' predicate, however the search is limited to the
>> first line, which I couldn't figure out how to do with the existing
>> functionality.  Unlike 'desc()', this supports regular expressions.  The
>> 'matching()' revset already treats the summary line distinctly from the full
>> description, but that finds similar revisions instead of searching with a
>> string.
> 
> Looks like a great usecase to handle, I want that. However, are there reasons why:
> 
> 1) We could not add full pattern support to desc()
> 2) We could not make this an extra keyworkd parameters of desc()
> 
> Multiplying the revset predicate hurts discoverability, having less but more powerful predicate seems useful.

I share your concern about discoverability.

I started out editing desc(), but it's explicitly documented as case insensitive.  It seems confusing if matching for literals is case insensitive, but regex is case sensitive for the same predicate. (I vaguely recall that you can make regex case insensitive, but I think that would also surprise the user.)  I meant to mention that maybe we could add pattern support to desc() in addition to this, but forgot.  The other stuff I looked at that supports patterns, like tag(), is case sensitive for both literals and regex.  That makes sense, since those search for identifiers.

I didn't consider #2, because of the issues with #1.  Can you suggest a format for this? I don't see any other ways to slice and dice the description, so it kinda feels like the recently discussed boolean arg, i.e. the equivalent of:

    list get_matches(char *str, bool summary_only);

I'm not strongly opposed to this; I care more about the functionality than the incantation.  That said, there's been some serious bikeshedding at work about whether bug markers belong at the beginning or end of the summary.  The argument for at the beginning seems to be that it's easier to find by visually scanning each summary.  I'd like to push them towards the query tools, over eyeing things up.  Simpler and more concise will make it easier for the uninitiated.  Of course, I could probably create a shorter alias if needed.

> The name make perfect sense when you think about it at first contact "summary" made me though about "hg summary" and I was a bit confused about what this could do. I wonder if we can have something better (but questions about desc seems more important first)

I can see that, but I don't have any better ideas.  I mentioned the "summary" field in the matching() predicate to point to prior art and consistency in the revset arena.

> 
>> I'm torn on whether it should support case insensitivity, like 'desc()' and
>> 'keyword()'.  It can be handy, but it seems like it would be surprising if
>> literals and regex are handled differently, or if the regex was adjusted to be
>> case insensitive internally.
>> 
>> The use case is to be able to select bug fixes out of the summary line, while
>> ignoring similar markers within the body of the commit.  I've seen previously
>> where the bug bot will erroneously update a bug that is mentioned as a
>> parenthetical within the detail of the commit message.  To illustrate on the
>> local Mercurial repo (some old markers have an internal space):
>> 
>>  $ hg log -r "desc(r' (issue')" -T "{rev}\n" | wc -l
>>  1672
>>  $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
>>  >            | egrep '\(issue' | wc -l
>>  1633
>>  $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
>>  >            | egrep '\(issue[0-9]+\)' | wc -l
>>  1560
>>  $ hg log -r "summary(r're: \(issue\d+\)')" -T "{rev}\n" | wc -l
>>  1560
>> 
>> I'm not sure why the splitlines() logic is that complicated, but I copy/pasted
>> from templatefilters.firstline() for consistency.
>> 
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -2197,6 +2197,32 @@ def subrepo(repo, subset, x):
>> 
>>     return subset.filter(matches, condrepr=('<subrepo %r>', pat))
>> 
>> + at predicate('summary(string)', safe=True)
>> +def summary(repo, subset, x):
>> +    """Changesets with the given string in the summary line.  The match is case
>> +    sensitive.
>> +
>> +    If `value` starts with `re:`, the remainder of the value is treated as
>> +    a regular expression. To match a value that actually starts with `re:`,
>> +    use the prefix `literal:`.
>> +    """
>> +
>> +    # i18n: "summary" is a keyword
>> +    s = getstring(x, _("summary requires a string"))
>> +    kind, value, matcher = _substringmatcher(s)
>> +
>> +    def _matchvalue(r):
>> +        _desc = repo[r].description()
>> +        _sum = ''
>> +        try:
>> +            _sum = _desc.splitlines(True)[0].rstrip('\r\n')
>> +        except IndexError:
>> +            pass
>> +
>> +        return matcher(_sum)
>> +
>> +    return subset.filter(lambda r: _matchvalue(r), condrepr=('<summary %r>', s))
>> +
>> def _substringmatcher(pattern):
>>     kind, pattern, matcher = util.stringmatcher(pattern)
>>     if kind == 'literal':
>> diff --git a/tests/test-commit.t b/tests/test-commit.t
>> --- a/tests/test-commit.t
>> +++ b/tests/test-commit.t
>> @@ -252,13 +252,50 @@ full log
>>   commit-foo-subdir
>> 
>> 
>> +  $ cat > ../commit-log-test <<EOF
>> +  > issue fix (bug1234)
>> +  >
>> +  > multiline description here
>> +  > EOF
>> +  $ echo "edited" >> foo/plain-file
>> +  $ hg ci -l ../commit-log-test
>> +
>> +Summary doesn't search past the first line
>> +  $ hg log -r "summary('description')"
>> +
>> +Summary works with regex
>> +  $ hg log -r "summary(r're:\(bug\d+\)')"
>> +  changeset:   2:5364ddbe8265
>> +  tag:         tip
>> +  user:        test
>> +  date:        Thu Jan 01 00:00:00 1970 +0000
>> +  summary:     issue fix (bug1234)
>> +
>> +
>> +Summary works with partial matches
>> +  $ hg log -r "summary('foo')"
>> +  changeset:   0:65d4e9386227
>> +  user:        test
>> +  date:        Thu Jan 01 00:00:00 1970 +0000
>> +  summary:     commit-foo-subdir
>> +
>> +  changeset:   1:95b38e3a5b2e
>> +  user:        test
>> +  date:        Thu Jan 01 00:00:00 1970 +0000
>> +  summary:     commit-foo-dot
>> +
>> 
>> subdir log
>> 
>>   $ cd foo
>>   $ hg log .
>> +  changeset:   2:5364ddbe8265
>> +  tag:         tip
>> +  user:        test
>> +  date:        Thu Jan 01 00:00:00 1970 +0000
>> +  summary:     issue fix (bug1234)
>> +
>>   changeset:   1:95b38e3a5b2e
>> -  tag:         tip
>>   user:        test
>>   date:        Thu Jan 01 00:00:00 1970 +0000
>>   summary:     commit-foo-dot
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>> 
> 
> -- 
> Pierre-Yves David