[PATCH 3 of 3 STABLE] revset: more optimally handle __sub__ (issue4352)

Gregory Szorc gregory.szorc at gmail.com
Mon Sep 8 11:27:27 CDT 2014


On 9/8/14 4:59 AM, Pierre-Yves David wrote:
>
>
> On 09/08/2014 03:44 AM, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1410111760 25200
>> #      Sun Sep 07 10:42:40 2014 -0700
>> # Node ID e2d2918c41fb0fe878afc3a2796f5a1b68939f46
>> # Parent  5857865f46f0275786b68ae8dc64c579edcddabb
>> revset: more optimally handle __sub__ (issue4352)
>>
>> This patch introduces the _diffset class. This class is used to more
>> optimally compute set differences (__sub__).
>>
>> The initial implementation in this patch is not yet fully optimal.
>> If the set on the right hand side of the "-" operator is a lazy set,
>> it will be evaluated fully before results are returned. In the
>> future, this set should be iterated much like _addset does it.
>  >
>> The revset benchmarks run against a clone of the Firefox repository
>> show the following statistically significant changes (before
>> followed by after):
>>
>> revset #6: roots(0::tip)
>> wall 0.712861 comb 0.710000 user 0.700000 sys 0.010000 (best of 14)
>> wall 0.745027 comb 0.750000 user 0.750000 sys 0.000000 (best of 13)
>>
>> revset #9: author(lmoscovicz) or author(mpm)
>> wall 39.410163 comb 39.140000 user 38.340000 sys 0.800000 (best of 3)
>> wall 45.392911 comb 45.030000 user 43.740000 sys 1.290000 (best of 3)
>>
>> revset #10: author(mpm) or author(lmoscovicz)
>> wall 39.231088 comb 38.950000 user 38.180000 sys 0.770000 (best of 3)
>> wall 45.568156 comb 45.170000 user 43.820000 sys 1.350000 (best of 3)
>>
>> revset #16: roots((0::) - (0::tip))
>> wall 459.033609 comb 458.880000 user 458.850000 sys 0.030000 (best of 3)
>> wall  10.230451 comb  10.220000 user  10.200000 sys 0.020000 (best of 3)
>>
>> revset #22: max(::(tip~20) - obsolete())
>> wall 0.888049 comb 0.890000 user 0.890000 sys 0.000000 (best of 11)
>> wall 1.040373 comb 1.030000 user 1.030000 sys 0.000000 (best of 10)
>>
>> revset #24: (not public() - obsolete())
>> wall 0.377076 comb 0.380000 user 0.380000 sys 0.000000 (best of 26)
>> wall 0.191621 comb 0.190000 user 0.190000 sys 0.000000 (best of 50)
>>
>> While there are a handful of small regressions, there is a major win
>> in revset #16. Since a version of this revset runs at clone time and
>> makes cloning the Firefox repository many minutes slower than it
>> should be, the author believes the temporary and minor regressions
>> are justified.
>
> It has quite bad impact (+10%) on multiple revset. Some of them does not
> seems to involved substraction at all. Did you dig a bit the source of
> these regression?

revset.orset() performs a subtraction. In the case of the author() 
benchmarks, subset is a spanset covering the entire repository.

I /think/ the regression stems from the isinstance(x, baseset) handling 
in spanset.__sub__. Where we once were performing simple set math, now 
we perform something slightly more complicated.

Making _diffset._iterator work lazily in all cases should reclaim the 
performance loss. Or, we could add back in the special case handling for 
lazy-vs-non-lazy instances. This code is getting pretty complicated...


More information about the Mercurial-devel mailing list