4624 – Moderately complex (x or y or ... or z) revsets with about 350 items fail to parse after exceeding maximum recursion depth

Bug 4624 - Moderately complex (x or y or ... or z) revsets with about 350 items fail to parse after exceeding maximum recursion depth

Summary: Moderately complex (x or y or ... or z) revsets with about 350 items fail to ...

Status:	RESOLVED FIXED

Alias:	None

Product:	Mercurial
Classification:	Unclassified
Component:	Mercurial (show other bugs)
Version:	3.4-rc
Hardware:	Macintosh Mac OS

Importance:	normal bug
Assignee:	Bugzilla

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-04-25 12:39 UTC by Evan Priestley
Modified:	2015-06-11 00:00 UTC (History)
CC List:	3 users (show)

See Also:
Python Version:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Evan Priestley 2015-04-25 12:39 UTC

A moderately complex revset like this (with about 350 named revisions):

```
hg log --rev 'tip - (1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1 or 1)'
```

..fails to parse:

```
** unknown exception encountered, please report by visiting
** http://mercurial.selenic.com/wiki/BugTracker
** Python 2.7.6 (default, Sep  9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)]
** Mercurial Distributed SCM (version 3.1.1+20140916)
** Extensions loaded: 
Traceback (most recent call last):
  File "/usr/local/bin/hg", line 43, in <module>
    mercurial.dispatch.run()
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 28, in run
    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 69, in dispatch
    ret = _runcatch(req)
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 138, in _runcatch
    return _dispatch(req)
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 820, in _dispatch
    cmdpats, cmdoptions)
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 600, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 911, in _runcommand
    return checkargs()
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 882, in checkargs
    return cmdfunc()
  File "/Library/Python/2.7/site-packages/mercurial/dispatch.py", line 817, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/Library/Python/2.7/site-packages/mercurial/util.py", line 550, in check
    return func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/mercurial/commands.py", line 4182, in log
    revs, expr, filematcher = cmdutil.getlogrevs(repo, pats, opts)
  File "/Library/Python/2.7/site-packages/mercurial/cmdutil.py", line 1736, in getlogrevs
    revs = scmutil.revrange(repo, opts['rev'])
  File "/Library/Python/2.7/site-packages/mercurial/scmutil.py", line 565, in revrange
    m = revset.match(repo.ui, spec, repo)
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2058, in match
    tree = findaliases(ui, tree)
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2042, in findaliases
    return _expandaliases(aliases, tree, [], {})
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2033, in _expandaliases
    for t in tree)
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2033, in <genexpr>
    for t in tree)
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2033, in _expandaliases
    for t in tree)
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2033, in <genexpr>
    for t in tree)

... <snip 1300 lines> ...

  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2033, in <genexpr>
    for t in tree)
  File "/Library/Python/2.7/site-packages/mercurial/revset.py", line 2010, in _expandaliases
    if not isinstance(tree, tuple):
RuntimeError: maximum recursion depth exceeded while calling a Python object
```

This is similar to #3604, although that avoided the issue by using a different construct internally. In the general case, this can be worked around by rewriting the revset like this:

```
((1 or 2 or ... or 256) or (257 or 258 or ... or 512))
```

...which is fine, but a little silly.

It is possible that no real human users ever run into this and that "about 350" is, practically, good enough, and the parser isn't worth fixing, even given internal cases like #3604.

For context, my use case is that I want to identify new commits on some configurable subset of branches after pulling a repository. Some users have configured this set of branches to include thousands of branches, because they use Mercurial in interesting and creative ways. I can likely use the `ancestors(...)` construct from #3604, but before finding that it wasn't obvious to me that `ancestors(x, y, ...)` would have performance similar to `x or y or ...`, as it "feels" like a more complex construction. Without special knowledge of the internals, `x or y or ...` seemed like the simplest way to express the query.

Comment 1 Yuya Nishihara 2015-04-26 05:45 UTC

This is somewhat related to the bug 4565. My current plan for
these issues are:

1. make 'addset' accept more than two sets

    <addset
      <addset
        <baseset [1]>,
        <baseset [2]>>,
      <baseset [3]>>
   to
    <addset
      <baseset [1]>,
      <baseset [2]>,
      <baseset [3]>>

2. unnest chained 'or' operations in syntax tree, immediately after
   the parsing phase

    (or
      (or
        ('symbol', '1')
        ('symbol', '2'))
      ('symbol', '3'))
   to
    (or
      ('symbol', '1')
      ('symbol', '2')
      ('symbol', '3'))

Comment 2 Matt Mackall 2015-04-27 13:05 UTC

Yuya, I approve of this plan.

Comment 3 Yuya Nishihara 2015-06-02 09:14 UTC

Fixed by https://selenic.com/repo/hg/rev/b333ca94403d
revset: reduce nesting of chained 'or' operations (issue4624)

This reduces the stack depth of chained 'or' operations:
 - from O(n) to O(1) at the parsing, alias expansion and optimization phases
 - from O(n) to O(log(n)) at the evaluation phase

simplifyinfixops() must be applied immediately after the parsing phase.
Otherwise, alias expansion would crash by "maximum recursion depth exceeded"
error.

Test cases use 'x:y|y:z' instead of 'x|y' because I'm planning to optimize
'x|y' in a different way.

Benchmarks:

0) 605b1d32c1c0
1) this patch

revset #0: 0 + 1 + 2 + ... + 200
0) wall 0.026347 comb 0.030000 user 0.030000 sys 0.000000 (best of 101)
1) wall 0.023858 comb 0.030000 user 0.030000 sys 0.000000 (best of 112)

revset #1: 0 + 1 + 2 + ... + 1000
0) maximum recursion depth exceeded
1) wall 0.483341 comb 0.480000 user 0.480000 sys 0.000000 (best of 20)

revset #2: sort(0 + 1 + 2 + ... + 200)
0) wall 0.013404 comb 0.010000 user 0.010000 sys 0.000000 (best of 196)
1) wall 0.006814 comb 0.010000 user 0.010000 sys 0.000000 (best of 375)

revset #3: sort(0 + 1 + 2 + ... + 1000)
0) maximum recursion depth exceeded
1) wall 0.035240 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)

Comment 4 HG Bot 2015-06-03 14:29 UTC

Fixed by https://selenic.com/repo/hg/rev/7fbef7932af9
Yuya Nishihara <yuya@tcha.org>
revset: optimize 'or' operation of trivial revisions to a list

As seen in issue4565 and issue4624, GUI wrappers and automated scripts are
likely to generate a long query that just has numeric revisions joined by 'or'.
One reason why is that they allows users to choose arbitrary revisions from
a list. Because this use case isn't handled well by smartset, let's optimize
it to a plain old list.

Benchmarks:

1) reduce nesting of chained 'or' operations
2) optimize to a list (this patch)

revset #0: 0 + 1 + 2 + ... + 1000
1) wall 0.483341 comb 0.480000 user 0.480000 sys 0.000000 (best of 20)
2) wall 0.025393 comb 0.020000 user 0.020000 sys 0.000000 (best of 107)

revset #1: sort(0 + 1 + 2 + ... + 1000)
1) wall 0.035240 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
2) wall 0.026432 comb 0.030000 user 0.030000 sys 0.000000 (best of 102)

revset #2: first(0 + 1 + 2 + ... + 1000)
1) wall 0.028949 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
2) wall 0.025503 comb 0.030000 user 0.030000 sys 0.000000 (best of 106)

(please test the fix)

Comment 5 Bugzilla 2015-06-11 00:00 UTC

Bug was set to TESTING for 7 days, resolving