Fwd: [PATCH 0 of 2] fast manifest based grep

Steve Borho steve at borho.org
Sat May 15 14:27:46 CDT 2010


Sorry, I did not intend to leave the mailing list out of this response.


---------- Forwarded message ----------
From: Steve Borho <steve at borho.org>
Date: Sat, May 15, 2010 at 2:26 PM
Subject: Re: [PATCH 0 of 2] fast manifest based grep
To: Matt Mackall <mpm at selenic.com>


On Sat, May 15, 2010 at 1:42 PM, Steve Borho <steve at borho.org> wrote:
> On Sat, May 15, 2010 at 12:55 PM, Matt Mackall <mpm at selenic.com> wrote:
>> On Fri, 2010-05-14 at 15:54 -0500, steve at borho.org wrote:
>>> These two patches add fast manifest based grep methods for when the
>>> user is not interested in per-line annotation data.  Especially handy
>>> for platforms without a native grep command.
>>
>> Please:
>>
>> - remind us all / show us how the crazy old command worked
>
> By default, the grep command has always done a filelog search for each
> match in order to report a changeset for each match.  For example:
>
> % hg grep -I contrib progress
> contrib/check-code.py:10905:    (r'ui\.(status|progress|write|note)\([\'\"]x',
> contrib/shrink-revlog.py:10724:            ui.progress(_('reading'),
> rev, total=len(rl))
> contrib/shrink-revlog.py:10724:        ui.progress(_('reading'), None)
> <snip>
>
> code-check.py was modified in 10905, but it did not involve the line
> being shown.  10905 just happens to be the most recent revision to
> touch that file.  Also note that grep completely ignores uncommitted
> changes in the working copy.
>
> The behavior of grep -r is even less useful.  Instead of searching the
> contents of the files as they appeared in that revision, it restricts
> the search to files modified in that revision.
>
> % hg grep -r 0.9 progress
> <nothing>
>
>> - tell us / show us how your change is different
>
> That command:  'hg grep -I contrib progress' took 2.14s on my machine.
>  After my patch, by directly searching the working directory and not
> reporting a meaningless revision number, the same command takes 0.22s.
>
> 'hg grep -c . -I progress' (reading the parent revision contents
> instead of the working copy) takes 0.20s

That should have been 'hg grep -c . -I contrib progress'.  I'll also
note that someone with more regexp fu than myself could probably
improve the search times in the new algorithm.

>
> grep --change rev does what the user would expect; searching the files
> as they appeared in that revision.
>
> % hg grep -c 0.9 progress
> contrib/hgk:    global findinprogress
> contrib/hgk:    if {[info exists findinprogress]} {
> contrib/hgk:    unset findinprogress
> <snip>
>
>> - tell us why changing the current behavior is acceptable
>
> In effect, these patches change four things and only the first two
> change existing behavior.
>
> 1) No longer report a revision number by default
> 2) Include working copy changes by default
> 3) Add a --change argument that does what users expect
> 4) adds a progress bar to the default and changeset searches
>
> According to informal polls on IRC, not many people use the grep
> command because of its current quirks.  I think my changes make the
> behavior more useful, discoverable, and much faster.

I found a bit of a surprise when I ran the test suite:

--- /home/steve/tools/crew/tests/test-grep.out
+++ /home/steve/tools/crew/tests/test-grep.err
@@ -1,13 +1,13 @@
 % pattern error
 grep: invalid match pattern: nothing to repeat
 % simple
-port:4:export
-port:4:vaportight
-port:4:import/export
+port:export
+port:vaportight
+port:import/export
<snip a lot of similar changes from rev now missing>
 % match in last line without newline
 adding noeol
 % last character omitted in output to avoid infinite loop
-noeol:4:no infinite loo
+noeol:no infinite loop
 % issue 685
<snip uninteresting context>

So apparently my new version does not have this infinite loop bug and
hacky workaround.

--
Steve Borho


More information about the Mercurial-devel mailing list