[PATCH 2 of 2] revset: speed up existence checks for ordered filtered sets

Mon Sep 21 23:28:46 CDT 2015

On Mon, 21 Sep 2015 10:14:21 -0700, Durham Goode wrote:
> On 9/21/15 9:25 AM, Yuya Nishihara wrote:
> > On Sun, 20 Sep 2015 20:51:36 -0700, Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham at fb.com>
> >> # Date 1442793222 25200
> >> #      Sun Sep 20 16:53:42 2015 -0700
> >> # Node ID 46a81d56dfabfc6ed8e1c08dae41b840574b8835
> >> # Parent  9deecd57915da9ed217abafb81cf8cba506b2849
> >> revset: speed up existence checks for ordered filtered sets
> >>
> >> Previously, calling 'if foo:' on a ordered filtered set would start iterating in
> >> whatever the current direction was and return if a value was available. If the
> >> current direction was ascending, but the set had a fastdesc available, this
> >> meant we did a lot more work than necessary.
> >>
> >> If this was applied without my previous max/min fixes, it would improve max()
> >> performance (this was my first attempt at fixing the issue). Since those
> >> previous fixes went in though, this doesn't have a visible benefit in the
> >> benchmarks, but it does seem clearly better than it was before so I think it
> >> should still go in.
> >>
> >> diff --git a/mercurial/revset.py b/mercurial/revset.py
> >> --- a/mercurial/revset.py
> >> +++ b/mercurial/revset.py
> >> @@ -3111,7 +3111,12 @@ class filteredset(abstractsmartset):
> >>           return lambda: self._iterfilter(it())
> >>   
> >>       def __nonzero__(self):
> >> -        for r in self:
> >> +        it = self
> >> +        fast = self.fastdesc or self.fastasc
> >> +        if fast:
> >> +            it = fast()
> >> +
> >> +        for r in it:
> >>               return True
> > I think the PATCH 1 is great, but I'm not sure about this.
> >
> > If I understand it, this is fast only if the filtered revisions exist near
> > fastdesc side?
> >
> >    condition = parents(. + .^)  # subset.fastdesc can reach the condition faster
> >    subset = fullreposet
>
> We can reverse the preference order if you want ("fastasc or 
> fastdesc").  That will maintain the old behavior of preferring ascending 
> orders.  The point is that self.__iter__ is not known to be fast, so if 
> we do have a fast iterator available we should use it.

I got the point that __iter__ can be slower than the fast versions, but I
don't think the iteration cost is that important here.

Revision:
0) @
1) self.fastdesc or self.fastasc (PATCH 2)
2) self.fastasc or self.fastdesc

revset #0: max(parents(. + .^) - (. + .^)  & ::@)
0) 0.038698
1) 0.000446   1%
2) 0.038745

revset #1: min(parents(10 + 10^) - (10 + 10^) & ::@)
0) 0.057440
1) 0.096196 167%
2) 0.057400

revset #2: max(author(lmoscovicz))
0) 1.364134
1) 0.317355  23%
2) 1.349448

revset #3: min(author(mason))
0) 0.033658
1) 1.390670  x41
2) 0.033108