[PATCH 4 of 4] revset: do not nest addset by "addset + other" (issue4565)

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu May 21 23:22:03 CDT 2015

On 05/21/2015 07:28 AM, Yuya Nishihara wrote:
> On Wed, 20 May 2015 23:26:35 -0500, Pierre-Yves David wrote:
>> On 05/19/2015 07:18 AM, Yuya Nishihara wrote:
>>> # HG changeset patch
>>> # User Yuya Nishihara <yuya at tcha.org>
>>> # Date 1426409247 -32400
>>> #      Sun Mar 15 17:47:27 2015 +0900
>>> # Node ID 72e8a0b846616a92781beda44eab8d6c39c91933
>>> # Parent  9996864f9e14a826ebe20df3a48905a662fe8eba
>>> revset: do not nest addset by "addset + other" (issue4565)
>> There is good idea in this series and some interesting win.
>> However, this is adding significant more complexity and special casing
>> to the code. I'm feeling very uncomfortable about that since the
>> smartset class are already "not simple". I feel like we'll need to start
>> converting the code object to C class soon and every special case here
>> (eg: new class).
> Perhaps we can merge unionset and addset in C implementation. It just exists
> to avoid the use of "any()".

Since the only different is for the __contains__ and __nonzero__ case, 
we could keep a single class: 'addset(*subsets)' and monkey patch this 
them in the len(subsets) == 2 case. We are already using method monkey 
patching for optimisation purpose in other place of the code.

(and yes, I kindof insist on the *subsets) syntax.

>> There is also some more generic optimisation that could
>> be worthwhile. We should meet on IRC/video call to discuss the situation
>> in details
>> Here is the nice thing I see in your series:
>> - The logarithmic combination used for the sorting,
>> - The creation of a wider object when something is added to a addset
>> (wider instead of deeper)
>> There is couple of generic stuff that are worth looking at too:
>> - The optimizer could directly build a balanced tree for addition
>> instead of making it pure depth.
> Yes for "a + b + c", but probably no for "-r a -r b -r c".

the -r case seems even simpler as we control the creation of the things 
from start, the revrange logic should probably (1) first get a smartset 
from all elements in the list (2) then combine them with either a 
balanced tree addset or a addset(*results) if we get to that.

Since your current union set implementation is already relying on 
"balanced tree" for its recursive execution of __iter__, we can probably 
rely on balanced tree first.

>> - We could directly performs add operation for smartset that are already
>> all known (baseset + baseset, the case that actually interest use for
>> both currently open issue)
> Isn't it too optimistic?

The layer of smartset created by this combination as a very significant 
cost because it all python function call, and python object lookup. 
operation on base type in python is quite fast. The reason we have lazy 
revset is to lazily evaluate access to the Mercurial layer stuff like 
'children' or 'branch' that are very slow to compute. Combining list and 
set should be very fast. We know it is simple and safe to combine two 
baseset A and B if:

1) sorted(A+B) → baseset(A.set() | B.set(), ascending=True)
2) A.isdisjoint(B) -> baseset(chain(A, B))
   (with even some ascendingness information if ascending/descending and 
in properly ordered)

We can probably use (1) to simplifies things when the addset is sorted 
(2) mean we can avoid addset together in the common case of the issue we 
are trying to solve.

Using (2) would lead to quadratic memory allocation (logarithmic if we 
have balanced tree. But is still most likely a win for any case that 

> FWIW, I have 10 more patches that will do

Woot! This is very interesting information.

>   - fix stack overflow on alias expansion and optimize() (issue4624)

I assume this rely on existence of a unionset like object. right?

>   - build unionset([A, B, C]) directly from "a() + b() + c()"
 >     (perhaps, this can be balanced addsets)

I'm unclear how this is different from the previous bit? Is the previous 
bit is about --rev

>   - build baseset([a, b, c]) directly from "a + b + c"

This looks like awesome and a very good first step. Does this mean we 
have a reliable way to detect that "a" is a single rev?

 From these extra data I would suggest the following plan:

1) build baseset([a, b, c]) directly from "a + b + c"
   (bonus point if it simple enough to go on stable)

2) leave all of 'addset' implementation but for __init__,
    We allow 'addset(*subsets)' and this automatically build a balanced 
tree of addset with 2 children in each node.

3) We use a single addset() call to unify all elements in a revrange() call

4) The optimizer transforms nested addset into a single addset() call.

5.1) We look at the benefit of replacing the tree build by 'addset' with 
a single class.

5.2) We play with the idea of building baseset from operation involving 
baseset more directly.

Sounds like a plan?

>> Also
>> - We should also determine if it is safe to alter smartset in place.
>> This could simplify multiple things.
>> When and how could we discuss this?
> My IRC time is around 12:00-14:00 UTC (21:00-23:00 JST) on weekdays,
> or weekend?

Urg this is 03:00 05:00 in my timezone. I understand why we very rarely 
cross each other.

Pierre-Yves David

More information about the Mercurial-devel mailing list