[RFC] checkpatch.py

Stefano Mioli jstevie at gmail.com
Sat Jul 4 07:19:58 CDT 2009

Greg Ward wrote:
> On Fri, Jul 3, 2009 at 6:59 PM, Stefano Mioli <jstevie at gmail.com> wrote:
>> - DOS-style line endings
> Hmm.  How do you write something that runs on Windows and detects CRLF 
> without causing intense annoyance?  I mean, normal text files on Windows end
> with CRLF; if they don't, many editors get very confused. Do patches
> generated from such files on Windows contain CRLF too?  Who should be
> responsible for stripping the CR on the way to the repository?

If someone is hacking on mercurial on Windows, I hope he/she uses a decent
editor (Notepad++, UltraEdit...), not something like Notepad.
Such editors can recognize all flavors of line endings and ask/allow you to
convert between them, or work with the original line endings.

But we can throw the CRLF test away, no problem.

>> Please keep in mind that I'm no Python Ninja (actually these are my very
>> first lines of Python code), so the code will probably be not that
>> pythonic. Suggestions in this regard would help too.
> Very brave of you.  I'll try to be gentle.  ;-)

Thank you so much for taking the time to write such a detailed review.

Before I start to reply, I should make it clear that this script has been
written for Mercurial developers *only*. It's not intended to be a
general-purpose coding style checker.
That is, nobody except for Mercurial developers should use it.

I'll omit things on which I have nothing to say from the quoting below.

>> +    addedline_re = re.compile(r'^\+{1,2}[^\+][\w\W]*')
> Are you trying to handle just unified diffs?  Or context diffs as well?  What
> if I add the line "+++++++" to a file?  And why [\w\W]* instead of .*?
> If you only care about unified diff, I think this regex can be stricter and
> simpler:
> addedline_re = re.compile(r'^\+(.*)')
> as long as you also skip lines that match
> fileline_re = re.compile(r'\+\+\+ b/.*')
> (Note the "b/" limits this regex to use with git- or hg-generated diffs!)

Like I said, this tool is only intended as a checker for hg export's output, so 
it should be able to handle both --git and normal output.

> Anyways, with my simpler addedline_re, you can pull out match.group(1) and do
> all your checks (leading tab, trailing whitespace, length) on it.
>> +    leadingtab_re = re.compile(r'^ *\t{1,}[\s\S]*') +    trailingwspace_re
>> = re.compile(r'[ \t]+$') +    crlf_re = re.compile(r'\r\n$')
> Not convinced you even need regexes for these.
>> +    for line in lines: 
>> +        lineindex += 1
> I have seen 'count' used for this purpose elsewhere in Hg.  And you should
> initialize 'count = 0' right before the loop.

Ok for the name. As for the initialization, IIRC that variable was initialized
right before the loop (don't have the actual code here, sorry).

>> +        r = leadingtab_re.match(line) +        if (r): +
>> errs.append('line ' + str(lineindex) + ': leading tab')
> Again, regexes are overkill here.  You could just as easily say
> if line.startswith('\t'): errs.append(...)

Hmmm... correct me if I'm wrong, but the fact that a line does not start with
a \t doesn't necessarily mean that it will not contain \t's at all before the
actual code.

IOW, something like

         ....\t  ....

where dots represent spaces.

This could happen when you are changing the indentation of pre-existing
code, and you can't catch it with startswith().

>> +    if (len(errs) == 0 and len(warns) == 0):
> Lots of Pythonistas would just say "if not (errs or warns)" here.  And even
> more, I hope, would call those variables 'errors' and 'warnings'. ;-)

However, I see the point. Nice to be able to write a condition like that.

> Looks like a good start!

Grag, again thank you very much.

I'll try to improve the script and resubmit it.


More information about the Mercurial-devel mailing list