[PATCH 0 of 2] fast manifest based grep

Steve Borho steve at borho.org
Sat May 15 23:45:35 CDT 2010


On Sat, May 15, 2010 at 8:47 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Sat, 2010-05-15 at 13:42 -0500, Steve Borho wrote:
>> On Sat, May 15, 2010 at 12:55 PM, Matt Mackall <mpm at selenic.com> wrote:
>> > On Fri, 2010-05-14 at 15:54 -0500, steve at borho.org wrote:
>> >> These two patches add fast manifest based grep methods for when the
>> >> user is not interested in per-line annotation data.  Especially handy
>> >> for platforms without a native grep command.
>> >
>> > Please:
>> >
>> > - remind us all / show us how the crazy old command worked
>>
>> By default, the grep command has always done a filelog search for each
>> match in order to report a changeset for each match.  For example:
>>
>> % hg grep -I contrib progress
>> contrib/check-code.py:10905:    (r'ui\.(status|progress|write|note)\([\'\"]x',
>> contrib/shrink-revlog.py:10724:            ui.progress(_('reading'),
>> rev, total=len(rl))
>> contrib/shrink-revlog.py:10724:        ui.progress(_('reading'), None)
>> <snip>
>>
>> code-check.py was modified in 10905, but it did not involve the line
>> being shown.  10905 just happens to be the most recent revision to
>> touch that file.
>
> Yeah, that number seems.. wrong. We should either show "the last
> changeset it appears in that file" (often tip) or "the changeset where
> it most recently appeared". It's currently showing "the last changeset
> it appears in a file where that file was also modified".
>
>>   Also note that grep completely ignores uncommitted
>> changes in the working copy.
>>
>> The behavior of grep -r is even less useful.  Instead of searching the
>> contents of the files as they appeared in that revision, it restricts
>> the search to files modified in that revision.
>
> Yeah, I'm not sure that makes any sense. More useful behaviors would be:
>
> - search all files present in a revision
> - check if the pattern was -introduced- in any of the changes in a
> revision
>
> The current behavior is similar to the latter, but not quite.
>
>> % hg grep -r 0.9 progress
>> <nothing>
>>
>> > - tell us / show us how your change is different
>>
>> That command:  'hg grep -I contrib progress' took 2.14s on my machine.
>>  After my patch, by directly searching the working directory and not
>> reporting a meaningless revision number, the same command takes 0.22s.
>>
>> 'hg grep -c . -I progress' (reading the parent revision contents
>> instead of the working copy) takes 0.20s
>>
>> grep --change rev does what the user would expect; searching the files
>> as they appeared in that revision.
>>
>> % hg grep -c 0.9 progress
>> contrib/hgk:    global findinprogress
>> contrib/hgk:    if {[info exists findinprogress]} {
>> contrib/hgk:  unset findinprogress
>> <snip>
>>
>> > - tell us why changing the current behavior is acceptable
>>
>> In effect, these patches change four things and only the first two
>> change existing behavior.
>>
>> 1) No longer report a revision number by default
>
> How do we get them back?

By asking for annotation data.  Any use of 'user', 'date', 'follow',
'rev', or 'all' will fall back to the filerev search routine that
outputs a revision field.

I think this is consistent with other commands.  Specifying no
revision arguments defaults to the working directory, a --change
argument specifies a specific revision, and --rev is used to specify
ranges.

So adding -u gives us the original format, plus a username:

% hg grep -u -I contrib progress | grep check-code
contrib/check-code.py:10905:nicdumz:
(r'ui\.(status|progress|write|note)\([\'\"]x',

Gilles suggested -r: as a way to get the old behavior, but this
actually reverses the order of the revision search:

% hg grep -r: -I contrib progress | grep check-code
contrib/check-code.py:10451:
(r'ui\.(status|progress|write|note)\([\'\"]x', "unwrapped ui
message"),

Revision 10451 was the revision that added that line to check-code.py.
 This probably not what a user expects either.  They'll want to see
the last revision that modified this line.  But the only way to get
that is to use the --all option.

So, 'hg grep -rtip:null -I contrib progress' is the equivalent to the
old behavior.

% hg grep -r 0.9:1.0 -I contrib progress | grep hgk
contrib/hgk:2297:    global findinprogress

It's reporting 2297, which is the first changeset after 0.9 to include
hgk, but 2297 had no effect on the match phrase.

If I can try to summarize, the old behavior is pretty confusing.  With
these changes the user has a functioning "search files at specific
revision, or wc".  And they can still use --all, which reports all the
actual changesets that added or removed matching lines, which was
about the only useful mode of the old grep.

--
Steve Borho

PS: I've rebased my patches and grep now returns 'not found' for both
code paths.


More information about the Mercurial-devel mailing list