[RFC] checkpatch.py

Greg Ward greg-hg at gerg.ca
Fri Jul 3 19:46:15 CDT 2009


On Fri, Jul 3, 2009 at 6:59 PM, Stefano Mioli<jstevie at gmail.com> wrote:
> It only does extra-simple checks for now, namely:
>
> - leading tabs
> - trailing whitespaces
> - DOS-style line endings

Hmm.  How do you write something that runs on Windows and detects CRLF
without causing intense annoyance?  I mean, normal text files on
Windows end with CRLF; if they don't, many editors get very confused.
Do patches generated from such files on Windows contain CRLF too?  Who
should be responsible for stripping the CR on the way to the
repository?

> Please keep in mind that I'm no Python Ninja (actually these are my
> very first lines of Python code), so the code will probably be not
> that pythonic. Suggestions in this regard would help too.

Very brave of you.  I'll try to be gentle.  ;-)

> --- /dev/null   Thu Jan 01 00:00:00 1970 +0000
> +++ b/contrib/checkpatch.py     Sat Jul 04 00:02:48 2009 +0200
> @@ -0,0 +1,86 @@
> +#!/usr/bin/python

Mercurial convention is to use "/usr/bin/env python".

> +import sys, re

PEP 8 says this should be

  import sys
  import re

but the rest of Mercurial does it as you have coded it, so... whatever.

> +def checkpatch(lines):
> +
> +    errs = list()
> +    warns = list()
> +
> +    lineindex = 0
> +
> +    addedline_re = re.compile(r'^\+{1,2}[^\+][\w\W]*')

Are you trying to handle just unified diffs?  Or context diffs as
well?  What if I add the line "+++++++" to a file?  And why [\w\W]*
instead of .*?

If you only care about unified diff, I think this regex can be
stricter and simpler:

  addedline_re = re.compile(r'^\+(.*)')

as long as you also skip lines that match

  fileline_re = re.compile(r'\+\+\+ b/.*')

(Note the "b/" limits this regex to use with git- or hg-generated diffs!)

Anyways, with my simpler addedline_re, you can pull out match.group(1)
and do all your checks (leading tab, trailing whitespace, length) on
it.

> +    leadingtab_re = re.compile(r'^ *\t{1,}[\s\S]*')
> +    trailingwspace_re = re.compile(r'[ \t]+$')
> +    crlf_re = re.compile(r'\r\n$')

Not convinced you even need regexes for these.

> +    for line in lines:
> +        lineindex += 1

I have seen 'count' used for this purpose elsewhere in Hg.  And you
should initialize 'count = 0' right before the loop.

> +        r = addedline_re.match(line)
> +        if (not r):

All of your ifs have unnecessary parentheses.  The Pythonic way is

  if not r:

Oh yeah, my personal convention is that I always call the result of a
regex match 'match'.  Thus,

  match = addedline_re.match(line)
  if not match:
     ...

> +        line = line[1:]

Here's where you could use match.group(1) instead.  In this case it's
trivial, but in general it's nice to keep knowledge about string
structure in one place: the regex. (*If* it's worth using a regex,
that is.)

> +        r = leadingtab_re.match(line)
> +        if (r):
> +            errs.append('line ' + str(lineindex) + ': leading tab')

Again, regexes are overkill here.  You could just as easily say

  if line.startswith('\t'):
      errs.append(...)

Oh yeah, you need to learn about string formatting:

  errs.append('line %d: leading tab' % lineindex)

> +        r = trailingwspace_re.search(line)
> +        if (r):
> +            errs.append('line ' + str(lineindex) + ': trailing whitespace')

A common Python idiom for this is

  if line.rstrip() != line:
      errs.append(...)

> +        r = crlf_re.search(line)
> +        if (r):
> +            errs.append('line ' + str(lineindex) + ': DOS style line ending')

Again, line.endswith('\r\n') would work here... except for people
developing on Windows, who might get annoyed by it.  (See above
questions.)

> +    if (len(errs) == 0 and len(warns) == 0):

Lots of Pythonistas would just say "if not (errs or warns)" here.  And
even more, I hope, would call those variables 'errors' and 'warnings'.
 ;-)

Also, you could just print the errors/warnings as you find them and
count them for later use.  Streaminess is generally better than
batchiness.

> +        print 'patch looks good'
> +    else:
> +        errors = (len(errs) > 0)
> +        warnings = (len(warns) > 0)

Totally unnecessary: a non-empty list is true.

> +        msg = ''
> +        if (errors):
> +            msg = str(len(errs)) + ' errors'
> +        if (warnings):
> +            if (len(msg)):
> +                msg += ', '
> +            msg += str(len(warns)) + ' warnings'
> +        print msg + '\n'

Oh yeah, warnings and errors should be printed to stderr.

And the process should exit with non-zero status if there were errors.

Looks like a good start!

Greg



More information about the Mercurial-devel mailing list