[PATCH 4 of 4] revset: do not nest addset by "addset + other" (issue4565)
Yuya Nishihara
yuya at tcha.org
Fri May 22 07:29:46 CDT 2015
On Thu, 21 May 2015 23:22:03 -0500, Pierre-Yves David wrote:
> On 05/21/2015 07:28 AM, Yuya Nishihara wrote:
> > On Wed, 20 May 2015 23:26:35 -0500, Pierre-Yves David wrote:
> >> On 05/19/2015 07:18 AM, Yuya Nishihara wrote:
> >>> # HG changeset patch
> >>> # User Yuya Nishihara <yuya at tcha.org>
> >>> # Date 1426409247 -32400
> >>> # Sun Mar 15 17:47:27 2015 +0900
> >>> # Node ID 72e8a0b846616a92781beda44eab8d6c39c91933
> >>> # Parent 9996864f9e14a826ebe20df3a48905a662fe8eba
> >>> revset: do not nest addset by "addset + other" (issue4565)
> >>
> >> There is good idea in this series and some interesting win.
> >>
> >> However, this is adding significant more complexity and special casing
> >> to the code. I'm feeling very uncomfortable about that since the
> >> smartset class are already "not simple". I feel like we'll need to start
> >> converting the code object to C class soon and every special case here
> >> (eg: new class).
> >
> > Perhaps we can merge unionset and addset in C implementation. It just exists
> > to avoid the use of "any()".
>
> Since the only different is for the __contains__ and __nonzero__ case,
> we could keep a single class: 'addset(*subsets)' and monkey patch this
> them in the len(subsets) == 2 case. We are already using method monkey
> patching for optimisation purpose in other place of the code.
>
> (and yes, I kindof insist on the *subsets) syntax.
Okay, I prefer sub-classing than monkey-patching, and an explicit list than
*expansion, but they're just a matter of taste. And we won't need them if
we choose the balanced addsets plan.
> >> There is also some more generic optimisation that could
> >> be worthwhile. We should meet on IRC/video call to discuss the situation
> >> in details
> >>
> >> Here is the nice thing I see in your series:
> >> - The logarithmic combination used for the sorting,
> >> - The creation of a wider object when something is added to a addset
> >> (wider instead of deeper)
> >>
> >> There is couple of generic stuff that are worth looking at too:
> >> - The optimizer could directly build a balanced tree for addition
> >> instead of making it pure depth.
> >
> > Yes for "a + b + c", but probably no for "-r a -r b -r c".
>
> the -r case seems even simpler as we control the creation of the things
> from start, the revrange logic should probably (1) first get a smartset
> from all elements in the list (2) then combine them with either a
> balanced tree addset or a addset(*results) if we get to that.
Good point. It seems I just didn't want to touch the big loop in revrange().
I'll take a look.
> Since your current union set implementation is already relying on
> "balanced tree" for its recursive execution of __iter__, we can probably
> rely on balanced tree first.
[...]
> 1) build baseset([a, b, c]) directly from "a + b + c"
> (bonus point if it simple enough to go on stable)
>
> 2) leave all of 'addset' implementation but for __init__,
> We allow 'addset(*subsets)' and this automatically build a balanced
> tree of addset with 2 children in each node.
>
> 3) We use a single addset() call to unify all elements in a revrange() call
>
> 4) The optimizer transforms nested addset into a single addset() call.
>
> 5.1) We look at the benefit of replacing the tree build by 'addset' with
> a single class.
>
> 5.2) We play with the idea of building baseset from operation involving
> baseset more directly.
(1) depends on (4).
My plan (that will use balanced addsets):
1) transform syntax tree: (+ (+ a b) c) -> (+ a b c)
2) build balanced addsets from it (general case)
3) optimize for trivial revisions: "a + b + c" -> baseset([a, b, c])
4) fix scmutil.revrange() not to chain addsets
> > My IRC time is around 12:00-14:00 UTC (21:00-23:00 JST) on weekdays,
> > or weekend?
>
> Urg this is 03:00 05:00 in my timezone. I understand why we very rarely
> cross each other.
I live in the other side of the earth. ;)
More information about the Mercurial-devel
mailing list