[PATCH] revset: introduce optional 'while' predicate for ancestors()

Sat Oct 11 21:03:35 CDT 2014

On 10/11/2014 05:45 PM, Mads Kiilerich wrote:
> On 10/11/2014 02:33 AM, Pierre-Yves David wrote:
>> On 10/07/2014 05:22 PM, Mads Kiilerich wrote:
>>> # HG changeset patch
>>> # User Mads Kiilerich <madski at unity3d.com>
>>> # Date 1412727753 -7200
>>> #      Wed Oct 08 02:22:33 2014 +0200
>>> # Node ID 7c48c97a07b865c86a75562f94656a64a8506273
>>> # Parent  564ae7d2ec9bee86b00a6ba817271ac0b19deca7
>>> revset: introduce optional 'while' predicate for ancestors()
>>>
>>> When specifying a 'while' set, ancestors() will now only visit
>>> parents that are
>>> in that set. This makes it possible to prune while doing an ancestor
>>> traversal
>>> and reduce the number of membership tests.  Such a pruning is very
>>> convenient
>>> when expensive checks are involved.
>>
>> Good feature, not really possible to achieve the same result easily
>> (and definitely not with the same perf)
>>
>>>
>>> The primary initial use case for this feature is that filtering on
>>> branch name
>>> is so expensive. Often it is just as relevant to prune everything not
>>> on the
>>> branch.
>>>
>>> Example:
>>>
>>>    $ hg --time debugrevspec 'branch(.)' | wc -l
>>>    time: real 9.380 secs (user 9.200+0.000 sys 0.180+0.000)
>>>    119
>>>
>>>    $ hg --time debugrevspec 'ancestors(.)&branch(.)' | wc -l
>>>    time: real 10.070 secs (user 9.940+0.000 sys 0.110+0.000)
>>>    119
>>>
>>>    $ hg --time debugrevspec 'ancestors(., branch(.))' | wc -l
>>>    time: real 0.160 secs (user 0.140+0.000 sys 0.020+0.000)
>>>    119
>>
>> Would be interested in output from the perf extension. Probably not
>> worth adding an entry in the official benchmark. But I may be wrong.
>
> What/how? The documentation of perf.py and its intended use is very
> sparse.

yeah, sure, the documentation is hard to get:

   $ hg help perfrevset
   hg perfrevset REVSET

   benchmark the execution time of a revset
   […]

<sarcasm/>

Joke aside I'm surprise you have never heard of this 6 year old 
extension. It host all kind of command dedicated to testing specific 
part of the mercurial code.

> There is also no mentioning of how revsetbenchmarks.txt should
> be used.

May you could look at the revsetbenchmarks.py file right next to ;-)

> The wiki gives no hits. I assume it is something that is
> related to some facebook internal setup using internal repos.

Yeah, sure, as it was facebook internal we put it in the Mercurial 
public repo. And then we had people external to Facebook update its 
content and run it ;-)

revsetbenchmarks.py is a scrip to run perfrevset (you, again!) for a 
list of revset against a set of revision (you are not so interested in 
the revision part.

    $ ./revsetbenchmark.py <revs-to-be-tested> [-f 
<file-to-read-revsets-from>]

if -f is omitted revsets are read from stdin.

revsetbenchmark.txt is just a list of interesting revsets we want tested 
on a regular basis.

>>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>>> --- a/mercurial/revset.py
>>> +++ b/mercurial/revset.py
>>> @@ -17,7 +17,7 @@ import obsolete as obsmod
>>>   import pathutil
>>>   import repoview
>>>
>>> -def _revancestors(repo, revs, followfirst):
>>> +def _revancestors(repo, revs, followfirst, while_=None):
>>>       """Like revlog.ancestors(), but supports followfirst."""
>>>       cut = followfirst and 1 or None
>>>       cl = repo.changelog
>>> @@ -41,10 +41,11 @@ def _revancestors(repo, revs, followfirs
>>>                           revsnode = revqueue.popleft()
>>>                           heapq.heappush(h, -revsnode)
>>>                   seen.add(current)
>>> -                yield current
>>> -                for parent in cl.parentrevs(current)[:cut]:
>>> -                    if parent != node.nullrev:
>>> -                        heapq.heappush(h, -parent)
>>> +                if while_ is None or current in while_:
>>> +                    yield current
>>> +                    for parent in cl.parentrevs(current)[:cut]:
>>> +                        if parent != node.nullrev:
>>> +                            heapq.heappush(h, -parent)
>>>
>>>       return _generatorset(iterate(), iterasc=False)
>>>
>>> @@ -344,15 +345,22 @@ def ancestor(repo, subset, x):
>>>       return baseset([])
>>>
>>>   def _ancestors(repo, subset, x, followfirst=False):
>>> -    args = getset(repo, spanset(repo), x)
>>> +    args = getargs(x, 0, 2, _('ancestors takes no, one or two
>>> arguments'))
>>>       if not args:
>>>           return baseset([])
>>> -    s = _revancestors(repo, args, followfirst)
>>> +    heads = getset(repo, _spanset(repo), args[0])
>>> +    if not heads:
>>> +        return baseset([])
>>> +    while_ = None
>>> +    if len(args) > 1:
>>> +        while_ = getset(repo, _spanset(repo), args[1])
>>
>> _spanset(repo) is very very wrong. Should be spanset(repo) (or
>> fullreposet(repo) as returned by the spanset call)
>
> Why? The spanset docstring says the opposite. It suggests that all use
> of spanset should replaced by either fullreposet or _spanset.

Because by doing so, you bypass the fullreposet optimisation. If you 
want a spanset over the whole repo, use fullreposet.

see http://selenic.com/hg/rev/fbae659543cf (and related changes for details)

-- 
Pierre-Yves David