[PATCH 3 of 3] rebase: use matcher to optimize manifestmerge
Yuya Nishihara
yuya at tcha.org
Mon Mar 20 04:14:04 EDT 2017
On Sun, 19 Mar 2017 12:00:58 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham at fb.com>
> # Date 1489949694 25200
> # Sun Mar 19 11:54:54 2017 -0700
> # Node ID 800c452bf1a44f9f817174c69443121f4ed4c3b8
> # Parent d598e42fa629195ecf43f438b71603df9fb66d6d
> rebase: use matcher to optimize manifestmerge
>
> The old merge code would call manifestmerge and calculate the complete diff
> between the source to the destination. In many cases, like rebase, the vast
> majority of differences between the source and destination are irrelevant
> because they are differences between the destination and the common ancestor
> only, and therefore don't affect the merge. Since most actions are 'keep', all
> the effort to compute them is wasted.
>
> Instead, let's compute the difference between the source and the common ancestor
> and only perform the diff of those files against the merge destination. When
> using treemanifest, this lets us avoid loading almost the entire tree when
> rebasing from a very old ancestor. This speeds up rebase of an old stack of 27
> commits by 20x.
Looks generally good to me, but this needs more eyes.
> @@ -819,6 +819,27 @@ def manifestmerge(repo, wctx, p2, pa, br
> if any(wctx.sub(s).dirty() for s in wctx.substate):
> m1['.hgsubstate'] = modifiednodeid
>
> + # Don't use m2-vs-ma optimization if:
> + # - ma is the same as m1 or m2, which we're just going to diff again later
> + # - The matcher is set already, so we can't override it
> + # - The caller specifically asks for a full diff, which is useful during bid
> + # merge.
> + if (pa not in ([wctx, p2] + wctx.parents()) and
> + matcher is None and not forcefulldiff):
Is this optimization better for normal merge where m2 might be far from m1?
> + # Identify which files are relevant to the merge, so we can limit the
> + # total m1-vs-m2 diff to just those files. This has significant
> + # performance benefits in large repositories.
> + relevantfiles = set(ma.diff(m2).keys())
> +
> + # For copied and moved files, we need to add the source file too.
> + for copykey, copyvalue in copy.iteritems():
> + if copyvalue in relevantfiles:
> + relevantfiles.add(copykey)
> + for movedirkey in movewithdir.iterkeys():
> + relevantfiles.add(movedirkey)
> + matcher = matchmod.match(repo.root, '',
> + ('path:%s' % p for p in relevantfiles))
Perhaps we can use scmutil.matchfiles(). patterns shouldn't be a generator
since it may be evaluated as a boolean.
More information about the Mercurial-devel
mailing list