[PATCH 3 of 3 STABLE] revset: more optimally handle sub (issue4352)

Mon Sep 8 13:14:46 CDT 2014

On 9/8/14 9:43 AM, Pierre-Yves David wrote:
>
>
> On 09/08/2014 06:27 PM, Gregory Szorc wrote:
>> On 9/8/14 4:59 AM, Pierre-Yves David wrote:
>>>
>>>
>>> On 09/08/2014 03:44 AM, Gregory Szorc wrote:
>
>>>
>>> It has quite bad impact (+10%) on multiple revset. Some of them does not
>>> seems to involved substraction at all. Did you dig a bit the source of
>>> these regression?
>>
>> revset.orset() performs a subtraction. In the case of the author()
>> benchmarks, subset is a spanset covering the entire repository.
>>
>> I /think/ the regression stems from the isinstance(x, baseset) handling
>> in spanset.__sub__. Where we once were performing simple set math, now
>> we perform something slightly more complicated.
>
> Can I get you to move from "I think" to "I check profile and it appears
> that"?
>
> If the isinstance is actually slowing things down, we should probably
> replace it by wome appropriate attribute lookup (duck typing my love)

No, the isinstance() check would be speeding it up by optimizing a 
special case where the subtracting set is fully known. This handling is 
just a workaround.

>> Making _diffset._iterator work lazily in all cases should reclaim the
>> performance loss. Or, we could add back in the special case handling for
>> lazy-vs-non-lazy instances. This code is getting pretty complicated...

I implemented a lazy _diffset._iterator and the speed penalty still 
remains. Running the author() revsets under --profile doesn't reveal any 
obvious culprit: these revsets spend most of their time doing changelog 
reading. However, the rewritten _diffset logic does appear to do 
slightly /more/ changelog reading. That's likely the culprit. Now, to 
find out why...

I may end up submitting a patch that only solves the lazy revset 
explosion for the roots() revset I identified because, well, that is 
making clones painful and a partial solution is better than no solution.

[PATCH 3 of 3 STABLE] revset: more optimally handle __sub__ (issue4352)

[PATCH 3 of 3 STABLE] revset: more optimally handle sub (issue4352)