[PATCH 01 of 10] check-code: add AST check for mutable default argument values

Pierre-Yves David pierre-yves.david at ens-lyon.org
Wed Jan 4 10:14:56 EST 2017


For the record, I'm overall strong +1 on Augie reply to Jun here.

I would like to add something about the implicit None testing to make my 
position clearer.

Implicit None testings are also bugs waiting to happens. In my 
programmer life I've lost days hunting down very strange bugs that ended 
up being cause by implicit None testing. So the current series from Greg 
is replacing one bug waiting to happen (mutable default value) with 
another bug waiting to happen (implicit None testing). As a result, even 
if the general intend is great (and I want to see it happen) the 
proposed implementation is at best a zero-sum move and probably less 
than that because tracking mutable default to fix them is easy while 
tracking implicit None check is much harder.

So we should not take this V1, but I'm looking forward to see a V2 
cleaning the mutable default argument without introducing implicit None 
checking.

Cheers,

On 12/28/2016 06:18 PM, Augie Fackler wrote:
>
>> On Dec 28, 2016, at 7:37 AM, Jun Wu <quark at fb.com> wrote:
>>
>> With this change,
>>
>>  - Code becomes longer (if None is tested explicitly)
>
> Yep.
>
>>  - Slower, extra tests and GCs (especially in a tight loop)
>
> Py_None will at worst get decref'd. It’s a singleton, so there won’t be any extra GC’ing. `is` checks in cpython are extremely fast (they’re exact pointer comparisons, so they’re bafflingly cheap compared to pretty much any other Python operation).
>
>>  - "None" passed explicitly is ignored
>
> This is easily remedied with a module-local singleton, e.g. _notset = object() and then use _notset instead of None for the sentinel. However: it’s idiomatic Python to use None as the “no value” part of a Maybe (to the point that pytype and mypy use typing.Optional[foo] to indicate ‘a foo or None’), and in general I’m strongly opposed to magic behavior when explicitly passing None. It’s like having magic behavior when passing NULL as a pointer value.
>
>> I'm aware that some other checkers do the same check but it does not look
>> the best way to avoid these kinds of errors due to the above issues.
>
> pylint does this check, so in a perfect world we’ll just use that. I’d still like to see these patches accepted (other than the check-code part since we could use pylint instead), since it’s better code.
>
>> I think a better approach is to define "frozenlist" (could we just use
>> tuple?), "frozendict" and use them. If we can run AST transformation in
>> modular loader for Python 2, we can just replace default {} and [] to their
>> "frozen" versions.
>
> Ehhh, I’m pretty deeply uncomfortable putting more magic in our AST transformer. I’ve still got a vision of how we can make some blame improvements and then actually do a one-time conversion of things like string constants and drop the AST mangler.
>
>>
>> Excerpts from Gregory Szorc's message of 2016-12-26 17:03:07 -0700:
>>> # HG changeset patch
>>> # User Gregory Szorc <gregory.szorc at gmail.com>
>>> # Date 1482796956 25200
>>> #      Mon Dec 26 17:02:36 2016 -0700
>>> # Node ID 568a4c4064b54e039f419813017182bb077f69f0
>>> # Parent  dc5b594f41e9be5820ce3f197d3817379b2d3af5
>>> check-code: add AST check for mutable default argument values
>>>
>>> Mutable default argument values (e.g. "def foo(a=[]):" can lead
>>> to unwanted run-time behavior due to Python reusing the original
>>> value from first invocation.
>>>
>>> This patch introduces a check-code check for the sanity of
>>> default argument values. It does so by parsing and walking the
>>> AST. Previously, all check-code checks were implemented as
>>> regular expressions. I probably could have implemented the check
>>> with regular expressions. However, AST based checks are more robust
>>> and arguably easier to implement for complex patterns (such as
>>> function argument lists).
>>>
>>> We issue errors for list and dict arguments, which are always bugs
>>> waiting to happen. We also issue errors for "attribute" ("a=x.y")
>>> and "subscript" ("a=x[y]") because there is no easy way to statically
>>> ensure the referenced value is a non-mutable type (it would require
>>> a lot of name resolution code at the AST layer or would require
>>> executing the code). There are definitely some false positives here.
>>> We could possibly ignore the "attribute" and "subscript" types to
>>> cut the noise.
>>>
>>> test-check-code.t has been updated with new failures.
>>>
>>> diff --git a/contrib/check-code.py b/contrib/check-code.py
>>> --- a/contrib/check-code.py
>>> +++ b/contrib/check-code.py
>>> @@ -19,8 +19,10 @@ when a rule triggers wrong, do one of th
>>>  * ONLY use no--check-code for skipping entire files from external sources
>>> """
>>>
>>> from __future__ import absolute_import, print_function
>>> +
>>> +import ast
>>> import glob
>>> import keyword
>>> import optparse
>>> import os
>>> @@ -489,8 +491,54 @@ checks = [
>>>     ('all except for .po', r'.*(?<!\.po)$', '',
>>>      allfilesfilters, allfilespats),
>>> ]
>>>
>>> +def checkpyast(f, source):
>>> +    """Perform ast checks against Python source."""
>>> +    errors = []
>>> +
>>> +    lines = source.splitlines()
>>> +
>>> +    class visitor(ast.NodeVisitor):
>>> +        def visit_FunctionDef(self, node):
>>> +            # Verify mutable default arguments aren't being used.
>>> +
>>> +            allowedtypes = (
>>> +                ast.Lambda,
>>> +                ast.Name,
>>> +                ast.Num,
>>> +                ast.Str,
>>> +                # Tuples aren't mutable, so they are safe as a default value.
>>> +                ast.Tuple,
>>> +            )
>>> +
>>> +            for d in node.args.defaults:
>>> +                msg = None
>>> +                if isinstance(d, ast.List):
>>> +                    msg = 'mutable default argument value (list)'
>>> +                elif isinstance(d, ast.Dict):
>>> +                    msg = 'mutable default argument value (dict)'
>>> +                elif isinstance(d, ast.Attribute):
>>> +                    msg = 'attribute default argument value may be mutable'
>>> +                elif isinstance(d, ast.Subscript):
>>> +                    msg = 'subscript default argument value may be mutable'
>>> +                elif isinstance(d, allowedtypes):
>>> +                    pass
>>> +                else:
>>> +                    msg = 'unknown default argument ast type: %s' % d
>>> +
>>> +                if msg:
>>> +                    errors.append((f, d.lineno, lines[d.lineno - 1], msg, ''))
>>> +
>>> +    try:
>>> +        root = ast.parse(source, f)
>>> +    except SyntaxError:
>>> +        return errors
>>> +
>>> +    visitor().visit(root)
>>> +
>>> +    return errors
>>> +
>>> def _preparepats():
>>>     for c in checks:
>>>         failandwarn = c[-1]
>>>         for pats in failandwarn:
>>> @@ -643,8 +691,11 @@ def checkfile(f, logfunc=_defaultlogger.
>>>
>>>                 errors.append((f, lineno and n + 1, l, msg, bd))
>>>                 result = False
>>>
>>> +        if name == 'python':
>>> +            errors.extend(checkpyast(f, pre))
>>> +
>>>         errors.sort()
>>>         for e in errors:
>>>             logfunc(*e)
>>>             fc += 1
>>> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
>>> --- a/tests/test-check-code.t
>>> +++ b/tests/test-check-code.t
>>> @@ -8,9 +8,48 @@ New errors are not allowed. Warnings are
>>> (The writing "no-che?k-code" is for not skipping this file when checking.)
>>>
>>>   $ hg locate -X contrib/python-zstandard -X hgext/fsmonitor/pywatchman |
>>>> sed 's-\\-/-g' | xargs "$check_code" --warnings --per-file=0 || false
>>> +  contrib/check-code.py:589:
>>> +   > def checkfile(f, logfunc=_defaultlogger.log, maxerr=None, warnings=False,
>>> +   attribute default argument value may be mutable
>>> +  contrib/hgclient.py:84:
>>> +   > def runcommand(server, args, output=sys.stdout, error=sys.stderr, input=None,
>>> +   attribute default argument value may be mutable
>>> +   attribute default argument value may be mutable
>>> +  doc/gendoc.py:147:
>>> +   > def helpprinter(ui, helptable, sectionfunc, include=[], exclude=[]):
>>> +   mutable default argument value (list)
>>> +   mutable default argument value (list)
>>> +  doc/hgmanpage.py:489:
>>> +   >                       sub=re.compile('-(?=-)').sub):
>>> +   attribute default argument value may be mutable
>>> +  doc/runrst:28:
>>> +   >             options={}, content=[]):
>>> +   mutable default argument value (dict)
>>> +   mutable default argument value (list)
>>> +  hgext/convert/common.py:58:
>>> +   >                  extra=None, sortkey=None, saverev=True, phase=phases.draft,
>>> +   attribute default argument value may be mutable
>>> +  hgext/mq.py:722:
>>> +   >                   fp=None, changes=None, opts={}):
>>> +   mutable default argument value (dict)
>>> +  hgext/rebase.py:686:
>>> +   > def _definesets(ui, repo, destf=None, srcf=None, basef=None, revf=[],
>>> +   mutable default argument value (list)
>>>   Skipping i18n/polib.py it has no-che?k-code (glob)
>>> +  mercurial/changegroup.py:260:
>>> +   >               targetphase=phases.draft, expectedtotal=None):
>>> +   attribute default argument value may be mutable
>>> +  mercurial/config.py:20:
>>> +   >     def __init__(self, data=None, includepaths=[]):
>>> +   mutable default argument value (list)
>>> +  mercurial/context.py:293:
>>> +   >     def match(self, pats=[], include=None, exclude=None, default='glob',
>>> +   mutable default argument value (list)
>>> +  mercurial/context.py:1498:
>>> +   >     def match(self, pats=[], include=None, exclude=None, default='glob',
>>> +   mutable default argument value (list)
>>>   mercurial/demandimport.py:309:
>>>>    if os.environ.get('HGDEMANDIMPORT') != 'disable':
>>>    use encoding.environ instead (py3)
>>>   mercurial/encoding.py:54:
>>> @@ -24,14 +63,48 @@ New errors are not allowed. Warnings are
>>>    use encoding.environ instead (py3)
>>>   mercurial/encoding.py:203:
>>>>                   for k, v in os.environ.items())
>>>    use encoding.environ instead (py3)
>>> +  mercurial/exchange.py:1019:
>>> +   > def _localphasemove(pushop, nodes, phase=phases.public):
>>> +   attribute default argument value may be mutable
>>> +  mercurial/filemerge.py:38:
>>> +   > def _toollist(ui, tool, part, default=[]):
>>> +   mutable default argument value (list)
>>> +  mercurial/hgweb/common.py:92:
>>> +   >     def __init__(self, code, message=None, headers=[]):
>>> +   mutable default argument value (list)
>>> +  mercurial/hgweb/webutil.py:145:
>>> +   >     def __init__(self, siblings=[], hiderev=None):
>>> +   mutable default argument value (list)
>>>   Skipping mercurial/httpclient/__init__.py it has no-che?k-code (glob)
>>>   Skipping mercurial/httpclient/_readers.py it has no-che?k-code (glob)
>>> +  mercurial/match.py:88:
>>> +   >     def __init__(self, root, cwd, patterns, include=[], exclude=[],
>>> +   mutable default argument value (list)
>>> +   mutable default argument value (list)
>>>   mercurial/policy.py:45:
>>>> policy = os.environ.get('HGMODULEPOLICY', policy)
>>>    use encoding.environ instead (py3)
>>>   Skipping mercurial/statprof.py it has no-che?k-code (glob)
>>> +  mercurial/util.py:1808:
>>> +   > def strdate(string, format, defaults=[]):
>>> +   mutable default argument value (list)
>>> +  tests/run-tests.py:506:
>>> +   >                  timeout=defaults['timeout'],
>>> +   subscript default argument value may be mutable
>>> +  tests/run-tests.py:507:
>>> +   >                  startport=defaults['port'], extraconfigopts=None,
>>> +   subscript default argument value may be mutable
>>> +  tests/run-tests.py:509:
>>> +   >                  slowtimeout=defaults['slowtimeout'], usechg=False):
>>> +   subscript default argument value may be mutable
>>> +  tests/test-hg-parseurl.py:7:
>>> +   > def testparse(url, branch=[]):
>>> +   mutable default argument value (list)
>>> +  tests/test-trusted.py:175:
>>> +   > def assertraises(f, exc=error.Abort):
>>> +   attribute default argument value may be mutable
>>>   [1]
>>>
>>> @commands in debugcommands.py should be in alphabetical order.
>>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list