[PATCH V2] copies: optimize forward copy detection logic for rebases

Augie Fackler raf at durin42.com
Fri Feb 5 16:43:25 EST 2016


On Fri, Feb 05, 2016 at 01:23:56PM -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham at fb.com>
> # Date 1454707404 28800
> #      Fri Feb 05 13:23:24 2016 -0800
> # Node ID 64726d7baee7e6a32780a31e861cb5e9af8aefc6
> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
> copies: optimize forward copy detection logic for rebases

Dropped v1 and queued this one, many thanks!

>
> Forward copy detection (i.e. detecting what files have been moved/copied in
> commit X since ancestor Y) previously required diff'ing the manifests of both X
> and Y. This was expensive since it required reading both entire manifests and
> doing a set difference (they weren't already in a set because of the
> lazymanifest work). This cost almost 1 second on very large repositories, and
> happens N times for a rebase of N commits.
>
> This patch optimizes it for the case of rebase. In a rebase, we are comparing a
> commit against it's immediate parent, and therefore we can know what files
> changed by looking at ctx.files().  This let's us drastically decrease the size
> of the set comparison, and makes it O(# of changes) instead of O(size of
> manifest). This makes it take 1ms instead of 1000ms.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -10,7 +10,9 @@ from __future__ import absolute_import
>  import heapq
>
>  from . import (
> +    node,
>      pathutil,
> +    scmutil,
>      util,
>  )
>
> @@ -175,7 +177,18 @@ def _forwardcopies(a, b, match=None):
>      # we currently don't try to find where old files went, too expensive
>      # this means we can miss a case like 'hg rm b; hg cp a b'
>      cm = {}
> -    missing = _computeforwardmissing(a, b, match=match)
> +
> +    # Computing the forward missing is quite expensive on large manifests, since
> +    # it compares the entire manifests. We can optimize it in the common use
> +    # case of computing what copies are in a commit versus its parent (like
> +    # during a rebase or histedit). Note, we exclude merge commits from this
> +    # optimization, since the ctx.files() for a merge commit is not correct for
> +    # this comparison.
> +    forwardmissingmatch = match
> +    if not match and b.p1() == a and b.p2().node() == node.nullid:
> +        forwardmissingmatch = scmutil.matchfiles(a._repo, b.files())
> +    missing = _computeforwardmissing(a, b, match=forwardmissingmatch)
> +
>      ancestrycontext = a._repo.changelog.ancestors([b.rev()], inclusive=True)
>      for f in missing:
>          fctx = b[f]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list