[PATCH 0 of 2] fast manifest based grep
Steve Borho
steve at borho.org
Sat May 15 23:45:35 CDT 2010
On Sat, May 15, 2010 at 8:47 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Sat, 2010-05-15 at 13:42 -0500, Steve Borho wrote:
>> On Sat, May 15, 2010 at 12:55 PM, Matt Mackall <mpm at selenic.com> wrote:
>> > On Fri, 2010-05-14 at 15:54 -0500, steve at borho.org wrote:
>> >> These two patches add fast manifest based grep methods for when the
>> >> user is not interested in per-line annotation data. Especially handy
>> >> for platforms without a native grep command.
>> >
>> > Please:
>> >
>> > - remind us all / show us how the crazy old command worked
>>
>> By default, the grep command has always done a filelog search for each
>> match in order to report a changeset for each match. For example:
>>
>> % hg grep -I contrib progress
>> contrib/check-code.py:10905: (r'ui\.(status|progress|write|note)\([\'\"]x',
>> contrib/shrink-revlog.py:10724: ui.progress(_('reading'),
>> rev, total=len(rl))
>> contrib/shrink-revlog.py:10724: ui.progress(_('reading'), None)
>> <snip>
>>
>> code-check.py was modified in 10905, but it did not involve the line
>> being shown. 10905 just happens to be the most recent revision to
>> touch that file.
>
> Yeah, that number seems.. wrong. We should either show "the last
> changeset it appears in that file" (often tip) or "the changeset where
> it most recently appeared". It's currently showing "the last changeset
> it appears in a file where that file was also modified".
>
>> Also note that grep completely ignores uncommitted
>> changes in the working copy.
>>
>> The behavior of grep -r is even less useful. Instead of searching the
>> contents of the files as they appeared in that revision, it restricts
>> the search to files modified in that revision.
>
> Yeah, I'm not sure that makes any sense. More useful behaviors would be:
>
> - search all files present in a revision
> - check if the pattern was -introduced- in any of the changes in a
> revision
>
> The current behavior is similar to the latter, but not quite.
>
>> % hg grep -r 0.9 progress
>> <nothing>
>>
>> > - tell us / show us how your change is different
>>
>> That command: 'hg grep -I contrib progress' took 2.14s on my machine.
>> After my patch, by directly searching the working directory and not
>> reporting a meaningless revision number, the same command takes 0.22s.
>>
>> 'hg grep -c . -I progress' (reading the parent revision contents
>> instead of the working copy) takes 0.20s
>>
>> grep --change rev does what the user would expect; searching the files
>> as they appeared in that revision.
>>
>> % hg grep -c 0.9 progress
>> contrib/hgk: global findinprogress
>> contrib/hgk: if {[info exists findinprogress]} {
>> contrib/hgk: unset findinprogress
>> <snip>
>>
>> > - tell us why changing the current behavior is acceptable
>>
>> In effect, these patches change four things and only the first two
>> change existing behavior.
>>
>> 1) No longer report a revision number by default
>
> How do we get them back?
By asking for annotation data. Any use of 'user', 'date', 'follow',
'rev', or 'all' will fall back to the filerev search routine that
outputs a revision field.
I think this is consistent with other commands. Specifying no
revision arguments defaults to the working directory, a --change
argument specifies a specific revision, and --rev is used to specify
ranges.
So adding -u gives us the original format, plus a username:
% hg grep -u -I contrib progress | grep check-code
contrib/check-code.py:10905:nicdumz:
(r'ui\.(status|progress|write|note)\([\'\"]x',
Gilles suggested -r: as a way to get the old behavior, but this
actually reverses the order of the revision search:
% hg grep -r: -I contrib progress | grep check-code
contrib/check-code.py:10451:
(r'ui\.(status|progress|write|note)\([\'\"]x', "unwrapped ui
message"),
Revision 10451 was the revision that added that line to check-code.py.
This probably not what a user expects either. They'll want to see
the last revision that modified this line. But the only way to get
that is to use the --all option.
So, 'hg grep -rtip:null -I contrib progress' is the equivalent to the
old behavior.
% hg grep -r 0.9:1.0 -I contrib progress | grep hgk
contrib/hgk:2297: global findinprogress
It's reporting 2297, which is the first changeset after 0.9 to include
hgk, but 2297 had no effect on the match phrase.
If I can try to summarize, the old behavior is pretty confusing. With
these changes the user has a functioning "search files at specific
revision, or wc". And they can still use --all, which reports all the
actual changesets that added or removed matching lines, which was
about the only useful mode of the old grep.
--
Steve Borho
PS: I've rebased my patches and grep now returns 'not found' for both
code paths.
More information about the Mercurial-devel
mailing list