[PATCH 8 of 9] grep: add support for inverting matches

Pierre-Yves David pierre-yves.david at logilab.fr
Tue Oct 16 07:05:39 CDT 2012


On Sun, Oct 14, 2012 at 10:54:25PM +0200, Idan Kamara wrote:
> # HG changeset patch
> # User Idan Kamara <idankk86 at gmail.com>
> # Date 1350073302 -7200
> # Node ID fc9b6a59bb64e8fff98671f2e6627f10bd889d0f
> # Parent  726226e2c3947c4b1bc1e5b0cd8a75e28d5f2d27
> grep: add support for inverting matches

the important information is "what is the new command line switch?". Remember
that your summary line will likely ends in the release changelog.

> Since we support multiline regexps the implementation is a lot more
> complicated than it could have been. We aren't going over line by
> line to find matches so we can't simply return those lines that don't
> match any of the patterns.
> 
> Instead, we take the lines between i and j where i is the last line of the
> previous match and j is the first line of the current match. Lastly, we take
> all lines after the last line of the last match.
> 
> Unfortunately we can't use the common short option -v here since that's
> taken by --verbose which can also actually change grep's output so we
> can't even make an exception.
> 
> diff --git a/mercurial/commands.py b/mercurial/commands.py
> --- a/mercurial/commands.py
> +++ b/mercurial/commands.py
> @@ -2888,6 +2888,7 @@
>      ('e', 'regexp', [],
>       _('use this pattern to find matches (must be used if a pattern'
>         ' starts with -), multiple patterns are or-ed'), _('PATTERN')),
> +    ('', 'invert-match', None, _('invert the sense of matching')),
>      ('r', 'rev', [],
>       _('only search files changed within revision range'), _('REV')),
>      ('u', 'user', None, _('list the author (long with -v)')),
> @@ -2929,6 +2930,7 @@
>      sep, eol = ':', '\n'
>      if opts.get('print0'):
>          sep = eol = '\0'
> +    invert = opts.get('invert_match')
>  
>      getfile = util.lrucachefunc(repo.file)
>  
> @@ -2987,9 +2989,42 @@
>      def grepbody(fn, rev, body):
>          matches[rev].setdefault(fn, [])
>          m = matches[rev][fn]
> +        prevlend = 0
>          for lnum, lstart, lend, cstart, cend, line in matchlines(body):
> -            s = linestate(line, lnum, lstart, lend, cstart, cend)
> -            m.append(s)
> +            if invert:
> +                if lstart - 1 - prevlend > 0:
> +                    lookback = body[prevlend:lstart - 1]
> +                    inverts = lookback.split('\n')

Are you sure '\n' is enough are \r converted automatically or are you just
missing them there?

Consider using the `splitlines` mecthods on str instead.

also: `invert` (bool) vs `inverts` (list of string). such close variable name
are recipe to disaster.


> +                    invertedlnum = lnum - len(inverts)
> +                    invertedlstart = invertedlend = lstart - 1
> +                    for inv in inverts:
> +                        invertedlstart -= len(inv)
> +                        s = linestate(inv, invertedlnum, invertedlstart,
> +                                      invertedlend, -1, -1)
> +                        m.append(s)
> +
> +                        invertedlnum += 1
> +                        invertedlend = invertedlstart - 2
> +                        invertedlstart = invertedlend

code look a bit complicated. Why don't you just iterate over are enumeration of
all line ignoring those containts a match ?

(I'm not saying you are doing it wrong. I'm just curious of why you have this
complicated approach.)

> +            else:
> +                s = linestate(line, lnum, lstart, lend, cstart, cend)
> +                m.append(s)
> +            prevlend = lend
> +        if invert and len(m) and lend != len(body):
> +            lookahead = body[lend + 1:]
> +            if lookahead:
> +                inverts = lookahead.split('\n')

Same stuff about splitlines

> +                invertedlnum = lnum + 1
> +                invertedlstart = invertedlend = lend + 1
> +                for inv in inverts:
> +                    invertedlend += len(inv)
> +                    s = linestate(inv, invertedlnum, invertedlstart,
> +                                  invertedlend, -1, -1)
> +                    m.append(s)
> +
> +                    invertedlnum += 1
> +                    invertedlstart = invertedlend + 1
> +                    invertedlend = invertedlstart
>  
>      def difflinestates(a, b):
>          sm = difflib.SequenceMatcher(None, a, b)
> @@ -3054,9 +3089,17 @@
>                  if not opts.get('text') and binary():
>                      ui.write(" Binary file matches")
>                  else:
> -                    ui.write(before)
> -                    ui.write(match, label='grep.match')
> -                    ui.write(after)
> +                    # We don't highlight anything on matches that were
> +                    # inverted.
> +                    if l.colstart == -1:
> +                        ui.write(l.line)
> +                    else:
> +                        before = l.line[:l.colstart]
> +                        match = l.line[l.colstart:l.colend]
> +                        after = l.line[l.colend:]
> +                        ui.write(before)
> +                        ui.write(match, label='grep.match')
> +                        ui.write(after)
>              ui.write(eol)
>              found = True
>          return found
> diff --git a/tests/test-debugcomplete.t b/tests/test-debugcomplete.t
> --- a/tests/test-debugcomplete.t
> +++ b/tests/test-debugcomplete.t
> @@ -250,7 +250,7 @@
>    debugwalk: include, exclude
>    debugwireargs: three, four, five, ssh, remotecmd, insecure
>    graft: rev, continue, edit, log, currentdate, currentuser, date, user, tool, dry-run
> -  grep: print0, all, text, follow, ignore-case, files-with-matches, line-number, no-filename, regexp, rev, user, date, include, exclude
> +  grep: print0, all, text, follow, ignore-case, files-with-matches, line-number, no-filename, regexp, invert-match, rev, user, date, include, exclude
>    heads: rev, topo, active, closed, style, template
>    help: extension, command, keyword
>    identify: rev, num, id, branch, tags, bookmarks, ssh, remotecmd, insecure
> diff --git a/tests/test-grep.t b/tests/test-grep.t
> --- a/tests/test-grep.t
> +++ b/tests/test-grep.t
> @@ -39,6 +39,18 @@
>    4:vaportight
>    4:import/export
>  
> +invert
> +  $ hg grep --invert-match port port
> +  [1]
> +  $ hg grep --invert-match -e exp -e vap -e imp port
> +  [1]
> +  $ hg grep -n --invert-match vapor port
> +  port:4:1:export
> +  port:4:3:import/export
> +  port:4:4:
> +  $ hg grep -n --invert-match 'ght\nimp' port
> +  port:4:1:export
> +
>  simple with color
>  
>    $ hg --config extensions.color= grep --config color.mode=ansi \
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

-- 
Pierre-Yves David

http://www.logilab.fr/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121016/19a3667a/attachment.pgp>


More information about the Mercurial-devel mailing list