[PATCH 3 of 3] rebase: use matcher to optimize manifestmerge

Mon Mar 20 22:10:08 EDT 2017

On 3/20/17 6:29 PM, Yuya Nishihara wrote:
> On Mon, 20 Mar 2017 17:14:38 -0700, Durham Goode wrote:
>> On 3/20/17 1:14 AM, Yuya Nishihara wrote:
>>> On Sun, 19 Mar 2017 12:00:58 -0700, Durham Goode wrote:
>>>> # HG changeset patch
>>>> # User Durham Goode <durham at fb.com>
>>>> # Date 1489949694 25200
>>>> #      Sun Mar 19 11:54:54 2017 -0700
>>>> # Node ID 800c452bf1a44f9f817174c69443121f4ed4c3b8
>>>> # Parent  d598e42fa629195ecf43f438b71603df9fb66d6d
>>>> rebase: use matcher to optimize manifestmerge
>>>>
>>>> The old merge code would call manifestmerge and calculate the complete diff
>>>> between the source to the destination. In many cases, like rebase, the vast
>>>> majority of differences between the source and destination are irrelevant
>>>> because they are differences between the destination and the common ancestor
>>>> only, and therefore don't affect the merge. Since most actions are 'keep', all
>>>> the effort to compute them is wasted.
>>>>
>>>> Instead, let's compute the difference between the source and the common ancestor
>>>> and only perform the diff of those files against the merge destination. When
>>>> using treemanifest, this lets us avoid loading almost the entire tree when
>>>> rebasing from a very old ancestor. This speeds up rebase of an old stack of 27
>>>> commits by 20x.
>>>
>>> Looks generally good to me, but this needs more eyes.
>>>
>>>> @@ -819,6 +819,27 @@ def manifestmerge(repo, wctx, p2, pa, br
>>>>          if any(wctx.sub(s).dirty() for s in wctx.substate):
>>>>              m1['.hgsubstate'] = modifiednodeid
>>>>
>>>> +    # Don't use m2-vs-ma optimization if:
>>>> +    # - ma is the same as m1 or m2, which we're just going to diff again later
>>>> +    # - The matcher is set already, so we can't override it
>>>> +    # - The caller specifically asks for a full diff, which is useful during bid
>>>> +    #   merge.
>>>> +    if (pa not in ([wctx, p2] + wctx.parents()) and
>>>> +        matcher is None and not forcefulldiff):
>>>
>>> Is this optimization better for normal merge where m2 might be far from m1?
>>
>> I'm not sure what you mean by 'normal merge'.  You mean like an 'hg
>> merge'?  Or like an hg update?  Any merge where you are merging in a
>> small branch will benefit from this (like hg up @ && hg merge
>> myfeaturebranch).
>
> 'hg merge'. What I had in mind was 'hg up stable && hg merge default' where
> pa::p2 would be likely to be as large as p1:p2.

In that case, it may not save any time, but the time would still be 
O(number of incoming changes).

> As for rebase (of non-merge commit), p2.files() might be usable since
> p2.p1() == pa.

Yea, that might be a potential optimization if p2.p1() == pa.  I'm 
hesitant to trust the commit files() list though, but maybe I'm overly 
cautious.