[PATCH 01 of 10] check-code: add AST check for mutable default argument values

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Dec 27 04:57:55 EST 2016



On 12/27/2016 01:03 AM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1482796956 25200
> #      Mon Dec 26 17:02:36 2016 -0700
> # Node ID 568a4c4064b54e039f419813017182bb077f69f0
> # Parent  dc5b594f41e9be5820ce3f197d3817379b2d3af5
> check-code: add AST check for mutable default argument values
>
> Mutable default argument values (e.g. "def foo(a=[]):" can lead
> to unwanted run-time behavior due to Python reusing the original
> value from first invocation.

I'm all for removing mutable default argument.

> This patch introduces a check-code check for the sanity of
> default argument values. It does so by parsing and walking the
> AST. Previously, all check-code checks were implemented as
> regular expressions. I probably could have implemented the check
> with regular expressions. However, AST based checks are more robust
> and arguably easier to implement for complex patterns (such as
> function argument lists).

I'm a bit worried at check-code growing too big. There is probably 
existing python analysis tool to catch this kind of error. (eg: I'm 
prtty sure pylint have a warning for this). And I would rather leverage 
that work from other people rather than attempting to maintain our own 
full featured tool.

Can you have a look in that other tools and tell us what you find?

> We issue errors for list and dict arguments, which are always bugs
> waiting to happen. We also issue errors for "attribute" ("a=x.y")
> and "subscript" ("a=x[y]") because there is no easy way to statically
> ensure the referenced value is a non-mutable type (it would require
> a lot of name resolution code at the AST layer or would require
> executing the code). There are definitely some false positives here.
> We could possibly ignore the "attribute" and "subscript" types to
> cut the noise.
>
> test-check-code.t has been updated with new failures.

The way we do this for normal code is that we fix the errors first and 
introduce the check later. (python3 have been a notable exception to this).

In particular I feel like we have a bit too many false positive here and 
having all the true positive in that changeset makes it harder to assess 
that it could be :-/

> diff --git a/contrib/check-code.py b/contrib/check-code.py
> --- a/contrib/check-code.py
> +++ b/contrib/check-code.py
> @@ -19,8 +19,10 @@ when a rule triggers wrong, do one of th
>   * ONLY use no--check-code for skipping entire files from external sources
>  """
>
>  from __future__ import absolute_import, print_function
> +
> +import ast
>  import glob
>  import keyword
>  import optparse
>  import os
> @@ -489,8 +491,54 @@ checks = [
>      ('all except for .po', r'.*(?<!\.po)$', '',
>       allfilesfilters, allfilespats),
>  ]
>
> +def checkpyast(f, source):
> +    """Perform ast checks against Python source."""
> +    errors = []
> +
> +    lines = source.splitlines()
> +
> +    class visitor(ast.NodeVisitor):
> +        def visit_FunctionDef(self, node):
> +            # Verify mutable default arguments aren't being used.
> +
> +            allowedtypes = (
> +                ast.Lambda,
> +                ast.Name,
> +                ast.Num,
> +                ast.Str,
> +                # Tuples aren't mutable, so they are safe as a default value.
> +                ast.Tuple,
> +            )
> +
> +            for d in node.args.defaults:
> +                msg = None
> +                if isinstance(d, ast.List):
> +                    msg = 'mutable default argument value (list)'
> +                elif isinstance(d, ast.Dict):
> +                    msg = 'mutable default argument value (dict)'
> +                elif isinstance(d, ast.Attribute):
> +                    msg = 'attribute default argument value may be mutable'
> +                elif isinstance(d, ast.Subscript):
> +                    msg = 'subscript default argument value may be mutable'
> +                elif isinstance(d, allowedtypes):
> +                    pass
> +                else:
> +                    msg = 'unknown default argument ast type: %s' % d
> +
> +                if msg:
> +                    errors.append((f, d.lineno, lines[d.lineno - 1], msg, ''))
> +
> +    try:
> +        root = ast.parse(source, f)
> +    except SyntaxError:
> +        return errors
> +
> +    visitor().visit(root)
> +
> +    return errors
> +
>  def _preparepats():
>      for c in checks:
>          failandwarn = c[-1]
>          for pats in failandwarn:
> @@ -643,8 +691,11 @@ def checkfile(f, logfunc=_defaultlogger.
>
>                  errors.append((f, lineno and n + 1, l, msg, bd))
>                  result = False
>
> +        if name == 'python':
> +            errors.extend(checkpyast(f, pre))
> +
>          errors.sort()
>          for e in errors:
>              logfunc(*e)
>              fc += 1
> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
> --- a/tests/test-check-code.t
> +++ b/tests/test-check-code.t
> @@ -8,9 +8,48 @@ New errors are not allowed. Warnings are
>  (The writing "no-che?k-code" is for not skipping this file when checking.)
>
>    $ hg locate -X contrib/python-zstandard -X hgext/fsmonitor/pywatchman |
>    > sed 's-\\-/-g' | xargs "$check_code" --warnings --per-file=0 || false
> +  contrib/check-code.py:589:
> +   > def checkfile(f, logfunc=_defaultlogger.log, maxerr=None, warnings=False,
> +   attribute default argument value may be mutable
> +  contrib/hgclient.py:84:
> +   > def runcommand(server, args, output=sys.stdout, error=sys.stderr, input=None,
> +   attribute default argument value may be mutable
> +   attribute default argument value may be mutable
> +  doc/gendoc.py:147:
> +   > def helpprinter(ui, helptable, sectionfunc, include=[], exclude=[]):
> +   mutable default argument value (list)
> +   mutable default argument value (list)
> +  doc/hgmanpage.py:489:
> +   >                       sub=re.compile('-(?=-)').sub):
> +   attribute default argument value may be mutable
> +  doc/runrst:28:
> +   >             options={}, content=[]):
> +   mutable default argument value (dict)
> +   mutable default argument value (list)
> +  hgext/convert/common.py:58:
> +   >                  extra=None, sortkey=None, saverev=True, phase=phases.draft,
> +   attribute default argument value may be mutable
> +  hgext/mq.py:722:
> +   >                   fp=None, changes=None, opts={}):
> +   mutable default argument value (dict)
> +  hgext/rebase.py:686:
> +   > def _definesets(ui, repo, destf=None, srcf=None, basef=None, revf=[],
> +   mutable default argument value (list)
>    Skipping i18n/polib.py it has no-che?k-code (glob)
> +  mercurial/changegroup.py:260:
> +   >               targetphase=phases.draft, expectedtotal=None):
> +   attribute default argument value may be mutable
> +  mercurial/config.py:20:
> +   >     def __init__(self, data=None, includepaths=[]):
> +   mutable default argument value (list)
> +  mercurial/context.py:293:
> +   >     def match(self, pats=[], include=None, exclude=None, default='glob',
> +   mutable default argument value (list)
> +  mercurial/context.py:1498:
> +   >     def match(self, pats=[], include=None, exclude=None, default='glob',
> +   mutable default argument value (list)
>    mercurial/demandimport.py:309:
>     >     if os.environ.get('HGDEMANDIMPORT') != 'disable':
>     use encoding.environ instead (py3)
>    mercurial/encoding.py:54:
> @@ -24,14 +63,48 @@ New errors are not allowed. Warnings are
>     use encoding.environ instead (py3)
>    mercurial/encoding.py:203:
>     >                    for k, v in os.environ.items())
>     use encoding.environ instead (py3)
> +  mercurial/exchange.py:1019:
> +   > def _localphasemove(pushop, nodes, phase=phases.public):
> +   attribute default argument value may be mutable
> +  mercurial/filemerge.py:38:
> +   > def _toollist(ui, tool, part, default=[]):
> +   mutable default argument value (list)
> +  mercurial/hgweb/common.py:92:
> +   >     def __init__(self, code, message=None, headers=[]):
> +   mutable default argument value (list)
> +  mercurial/hgweb/webutil.py:145:
> +   >     def __init__(self, siblings=[], hiderev=None):
> +   mutable default argument value (list)
>    Skipping mercurial/httpclient/__init__.py it has no-che?k-code (glob)
>    Skipping mercurial/httpclient/_readers.py it has no-che?k-code (glob)
> +  mercurial/match.py:88:
> +   >     def __init__(self, root, cwd, patterns, include=[], exclude=[],
> +   mutable default argument value (list)
> +   mutable default argument value (list)
>    mercurial/policy.py:45:
>     > policy = os.environ.get('HGMODULEPOLICY', policy)
>     use encoding.environ instead (py3)
>    Skipping mercurial/statprof.py it has no-che?k-code (glob)
> +  mercurial/util.py:1808:
> +   > def strdate(string, format, defaults=[]):
> +   mutable default argument value (list)
> +  tests/run-tests.py:506:
> +   >                  timeout=defaults['timeout'],
> +   subscript default argument value may be mutable
> +  tests/run-tests.py:507:
> +   >                  startport=defaults['port'], extraconfigopts=None,
> +   subscript default argument value may be mutable
> +  tests/run-tests.py:509:
> +   >                  slowtimeout=defaults['slowtimeout'], usechg=False):
> +   subscript default argument value may be mutable
> +  tests/test-hg-parseurl.py:7:
> +   > def testparse(url, branch=[]):
> +   mutable default argument value (list)
> +  tests/test-trusted.py:175:
> +   > def assertraises(f, exc=error.Abort):
> +   attribute default argument value may be mutable
>    [1]
>
>  @commands in debugcommands.py should be in alphabetical order.
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list