[PATCH] phase: improve retractboundary perf

Thu Nov 12 13:33:39 CST 2015

On 11/11/2015 03:20 PM, Durham Goode wrote:
> On 11/9/15 1:12 PM, Martin von Zweigbergk wrote:
>>
>>
>> On Sat, Nov 7, 2015 at 4:13 PM Durham Goode <durham at fb.com
>> <mailto:durham at fb.com>> wrote:
>>
>>     # HG changeset patch
>>     # User Durham Goode <durham at fb.com <mailto:durham at fb.com>>
>>     # Date 1446941509 28800
>>     #      Sat Nov 07 16:11:49 2015 -0800
>>     # Node ID 60db725e6c3d79385908a948ce5620419c22ac51
>>     # Parent  3b5d30cbbf1132be2325c5362a156038bdc84e2c
>>     phase: improve retractboundary perf
>>
>>     The existing retractboundary implementation computed the new
>>     boundary by walking
>>     all descendants of all existing roots and computing the new roots.
>>     This is
>>     O(commits since first root), which on long repos can be hundreds
>>     of thousands of
>>     commits.
>>
>>     The new algorithm only updates roots that are greater than the new
>>     root
>>     locations. For common operations like commit on a repo with the
>>     earliest root
>>     several hundred thousand commits ago, this makes retractboundary
>>     go from
>>     1 second to 0.008 seconds.
>>
>>
> There was some discussion on IRC about the safety of this (i.e. what if
> the new nodes are already part of the phase, etc).  I've looked into it
> and believe this patch is safe:
>
> 1) The old existing code already filters the input nodes to only contain
> nodes that require retracting (i.e. we only make node X a new root if
> the old phase is less than the target phase), so there's no chance of us
> adding a unnecessary root to the phase (unless the input root is made
> unnecessary by another root in the same input, but see point #3).
>
> 2) Another way of thinking about this is:  the only way the new
> algorithm would be different from the old algorithm is if it added a
> root that is a descendant of an old root (since the old algorithm
> would've caught this in the big "roots(%ln::)".  At the beginning of the
> function, when we filter out roots that already meet the phase criteria,
> the *definition* of meeting the phase criteria is "not being a
> descendant of an existing root".  Therefore, by definition none of the
> new roots we are processing are descendants of an existing root.
>
> 3) If two nodes are passed in as input, and one node is an ancestor of
> the other (and therefore the later node should not be a root), this is
> still caught by the 'roots(%ln::)' revset. So there's no chance of an
> extra root being introduced that way either.

I've added this explanation to the description and pushed the result to 
the clowncopter. Thanks for double checking.

-- 
Pierre-Yves David