D6255: copies: calculate mergecopies() based on pathcopies()
martinvonz (Martin von Zweigbergk)
phabricator at mercurial-scm.org
Tue Apr 16 16:31:54 EDT 2019
martinvonz added a comment.
In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote:
> I did a first path through it, the new code seems reasonable and easier
> to read than the previous one. Some comments and questions below.
The rest somehow didn't make it here, so I'll copy from the email (i.e. the below is from Pierre-Yves, not from me):
I did a first path through it, the new code seems reasonable and easier
to read than the previous one. Some comments and questions below.
On 4/16/19 7:19 PM, martinvonz (Martin von Zweigbergk) wrote:
> martinvonz created this revision.
> Herald added subscribers: mercurial-devel, mjpieters.
> Herald added a reviewer: hg-reviewers.
>
> REVISION SUMMARY
> When copies are stored in changesets, we need a changeset-centric
> version of mergecopies() just like we have a changeset-centric version
> of pathcopies(). I think the natural way of thinking about
> mergecopies() is in terms of pathcopies() from the base to each of the
> commits. So if we can rewrite mergecopies() based two such
> pathcopies() calls, we'll get the changeset-centric version for
> free. That's what this patch does.
I like the approach, if I understand it correctly, we'll have less
duplicated code in the end.
>
> A nice bonus is that it ends up being a lot simpler. mergecopies() has
> accumulated a lot of technical debt over time. One good example is the
> code for dealing with grafts (the "partial/incomplete/dirty"
> stuff). Since pathcopies() already deals with backwards renames and
> ping-pong renames, we get that for free.
>
> I've run tests with hard-coded debug logging for "fullcopy" and while
> I haven't looked at every difference it produces, all the ones I have
> looked at seemed reasonable to me.
How many difference did you had? can you share some example of them with
their explanation?
on related topic, you seems to be fixing a case in
test-fastannotate-hg.t and test-annotate-hg.t that is probably worth
mentioning.
> One drawback of the rewritten code is that we may now make
> remotefilelog prefetch more files. We used to prefetch files that were
> unique to either side of the merge compared to the other. We now
> prefetch files that are unique to either sise of the merge compared to
sise → side
> the base. This means that if you added the same file to each side, we
> would not prefetch it before, but we would now. Such cases are
> probably quite rare, but one likely scenario where they happen is when
> moving from a commit to its successor (or the other way around). The
> user will probably already have the files in the cache in such cases,
> so it's probably not a big deal.
>
> Some timings for calculating mergecopies between two revisions (all
> using the common ancestor as base):
Which revisions did you pick?
(for the record, the benchmark suite use 1daa622bbe42 and 76caed42cf7c)
> In the hg repo:
> 4.8 4.8: 0.21s -> 0.21s
> 4.0 4.8: 0.35s -> 0.63s
>
> In and old copy of the mozilla-unified repo:
> FIREFOX_BETA_60_BASE^ FIREFOX_BETA_60_BASE: 0.51s -> 0.60s
> FIREFOX_NIGHTLY_59_END FIREFOX_BETA_60_BASE: 2.1s -> 2.3s
> FIREFOX_BETA_59_END FIREFOX_BETA_60_BASE: 3.1s -> 3.3s
> FIREFOX_AURORA_50_BASE FIREFOX_BETA_60_BASE: 30s -> 35s
>
> So it's measurably slower in most cases. Note that merge copies are
> not calculated when updating with a clean working copy, which is
> probably the most common case. I therefore think the much simpler code
> is worth the slowdown.
Do you know where the slowdown comes from ?
> REPOSITORY
> https://phab.mercurial-scm.org/diffusion/HG/ Mercurial
>
> REVISION DETAIL
> https://phab.mercurial-scm.org/D6255
>
> AFFECTED FILES
> mercurial/copies.py
> tests/test-annotate.t
> tests/test-fastannotate-hg.t
> tests/test-graft.t
> tests/test-rename-merge2.t
>
> CHANGE DETAILS
>
> diff --git a/tests/test-rename-merge2.t b/tests/test-rename-merge2.t
>
> - a/tests/test-rename-merge2.t +++ b/tests/test-rename-merge2.t @@ -433,6 +433,9 @@
>
> --------------
> test L:nc a b R:up b W: - 12 merge b no ancestor
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 86a2aa42fc76+, remote: af30c7647fc7
> @@ -469,6 +472,9 @@
> --------------
> test L:up b R:nm a b W: - 13 merge b no ancestor
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 59318016310c+, remote: bdb19105162a
> @@ -506,6 +512,9 @@
> --------------
> test L:nc a b R:up a b W: - 14 merge b no ancestor
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 86a2aa42fc76+, remote: 8dbce441892a
> @@ -543,6 +552,9 @@
> --------------
> test L:up b R:nm a b W: - 15 merge b no ancestor, remove a
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 59318016310c+, remote: bdb19105162a
> @@ -580,6 +592,9 @@
> --------------
> test L:nc a b R:up a b W: - 16 get a, merge b no ancestor
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 86a2aa42fc76+, remote: 8dbce441892a
> @@ -617,6 +632,9 @@
> --------------
> test L:up a b R:nc a b W: - 17 keep a, merge b no ancestor
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 0b76e65c8289+, remote: 4ce40f5aca24
> @@ -653,6 +671,9 @@
> --------------
> test L:nm a b R:up a b W: - 18 merge b no ancestor
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 02963e448370+, remote: 8dbce441892a
> @@ -695,6 +716,9 @@
> --------------
> test L:up a b R:nm a b W: - 19 merge b no ancestor, prompt remove a
> --------------
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'a' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: False, partial: False
> ancestor: 924404dff337, local: 0b76e65c8289+, remote: bdb19105162a
> diff --git a/tests/test-graft.t b/tests/test-graft.t
>
> - a/tests/test-graft.t +++ b/tests/test-graft.t @@ -75,6 +75,8 @@
>
>
> $ hg graft -r 2 --base 3
> grafting 2:5c095ad7e90f "2"
> + note: possible conflict - c was deleted and renamed to:
> + a
> note: graft of 2:5c095ad7e90f created no changes to commit
>
> Can't continue without starting:
> @@ -220,6 +222,9 @@
> committing changelog
> updating the branch cache
> grafting 5:97f8bfe72746 "5"
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'c' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: True, partial: False
> ancestor: 4c60f11aa304, local: 6b9e5368ca4e+, remote: 97f8bfe72746
> @@ -233,6 +238,9 @@
> $ HGEDITOR=cat hg graft 4 3 --log --debug
> scanning for duplicate grafts
> grafting 4:9c233e8e184d "4"
> + all copies found (* = to merge, ! = divergent, % = renamed and deleted):
> + src: 'c' -> dst: 'b'
> + checking for directory renames
> resolving manifests
> branchmerge: True, force: True, partial: False
> ancestor: 4c60f11aa304, local: 1905859650ec+, remote: 9c233e8e184d
> @@ -1129,7 +1137,6 @@
> grafting 2:f58c7e2b28fa "C0"
> merging f1e and f1b to f1e
> merging f2a and f2c to f2c
> - merging f5b and f5a to f5a
>
> Test the cases A.1 (f4x) and A.7 (f3x).
>
> diff --git a/tests/test-fastannotate-hg.t b/tests/test-fastannotate-hg.t
>
> - a/tests/test-fastannotate-hg.t +++ b/tests/test-fastannotate-hg.t @@ -273,37 +273,10 @@
>
> > EOF
> $ hg ci -mc -d '3 0'
> created new head
> -BROKEN: 'a' was copied to 'b' on both sides. We should not get a merge conflict here
> $ hg merge
> merging b
> - warning: conflicts while merging b! (edit, then use 'hg resolve --mark')
> - 0 files updated, 0 files merged, 0 files removed, 1 files unresolved
> - use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon
> - [1]
> - $ cat b
> - <<<<<<< working copy: b80e3e32f75a - test: c
> - a
> - z
> - a
> - ||||||| base
> - =======
> - a
> - a
> - a
> - b4
> - c
> - b5
> - >>>>>>> merge rev: 64afcdf8e29e - test: mergeb
> - $ cat <<EOF > b
> - > a
> - > z
> - > a
> - > b4
> - > c
> - > b5
> - > EOF
> - $ hg resolve --mark -q
> - $ rm b.orig
> + 0 files updated, 1 files merged, 0 files removed, 0 files unresolved
> + (branch merge, don't forget to commit)
> $ echo d >> b
> $ hg ci -mmerge2 -d '4 0'
>
> diff --git a/tests/test-annotate.t b/tests/test-annotate.t
>
> - a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -273,37 +273,10 @@
>
> > EOF
> $ hg ci -mc -d '3 0'
> created new head
> -BROKEN: 'a' was copied to 'b' on both sides. We should not get a merge conflict here
> $ hg merge
> merging b
> - warning: conflicts while merging b! (edit, then use 'hg resolve --mark')
> - 0 files updated, 0 files merged, 0 files removed, 1 files unresolved
> - use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon
> - [1]
> - $ cat b
> - <<<<<<< working copy: b80e3e32f75a - test: c
> - a
> - z
> - a
> - ||||||| base
> - =======
> - a
> - a
> - a
> - b4
> - c
> - b5
> - >>>>>>> merge rev: 64afcdf8e29e - test: mergeb
> - $ cat <<EOF > b
> - > a
> - > z
> - > a
> - > b4
> - > c
> - > b5
> - > EOF
> - $ hg resolve --mark -q
> - $ rm b.orig
> + 0 files updated, 1 files merged, 0 files removed, 0 files unresolved
> + (branch merge, don't forget to commit)
> $ echo d >> b
> $ hg ci -mmerge2 -d '4 0'
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
>
> - a/mercurial/copies.py +++ b/mercurial/copies.py @@ -373,57 +373,6 @@
>
>
> return u1, u2
>
> -def _makegetfctx(ctx):
> - """return a 'getfctx' function suitable for _checkcopies usage
>
> - - We have to re-setup the function building 'filectx' for each - '_checkcopies' to ensure the linkrev adjustment is properly setup for - each. Linkrev adjustment is important to avoid bug in rename - detection. Moreover, having a proper '_ancestrycontext' setup ensures - the performance impact of this adjustment is kept limited. Without it, - each file could do a full dag traversal making the time complexity of - the operation explode (see issue4537). - - This function exists here mostly to limit the impact on stable. Feel - free to refactor on default. - """ - rev = ctx.rev() - repo = ctx._repo - ac = getattr(ctx, '_ancestrycontext', None) - if ac is None: - revs = [rev] - if rev is None: - revs = [p.rev() for p in ctx.parents()] - ac = repo.changelog.ancestors(revs, inclusive=True) - ctx._ancestrycontext = ac - def makectx(f, n): - if n in node.wdirfilenodeids: # in a working context? - if ctx.rev() is None: - return ctx.filectx(f) - return repo[None][f] - fctx = repo.filectx(f, fileid=n) - # setup only needed for filectx not create from a changectx - fctx._ancestrycontext = ac - fctx._descendantrev = rev - return fctx - return util.lrucachefunc(makectx) - -def _combinecopies(copyfrom, copyto, finalcopy, diverge, incompletediverge): - """combine partial copy paths""" - remainder = {} - for f in copyfrom: - if f in copyto: - finalcopy[copyto[f]] = copyfrom[f] - del copyto[f] - for f in incompletediverge: - assert f not in diverge - ic = incompletediverge[f] - if ic[0] in copyto: - diverge[f] = [copyto[ic[0]], ic[1]] - else: - remainder[f] = ic - return remainder -
>
> def mergecopies(repo, c1, c2, base):
> """
> Finds moves and copies between context c1 and c2 that are relevant for
> @@ -526,168 +475,93 @@
> This is pretty slow when a lot of changesets are involved but will track all
> the copies.
> """
> - # In certain scenarios (e.g. graft, update or rebase), base can be
> - # overridden We still need to know a real common ancestor in this case We
> - # can't just compute _c1.ancestor(_c2) and compare it to ca, because there
> - # can be multiple common ancestors, e.g. in case of bidmerge. Because our
> - # caller may not know if the revision passed in lieu of the CA is a genuine
> - # common ancestor or not without explicitly checking it, it's better to
> - # determine that here.
> - #
> - # base.isancestorof(wc) is False, work around that
> - _c1 = c1.p1() if c1.rev() is None else c1
> - _c2 = c2.p1() if c2.rev() is None else c2
> - # an endpoint is "dirty" if it isn't a descendant of the merge base
> - # if we have a dirty endpoint, we need to trigger graft logic, and also
> - # keep track of which endpoint is dirty
> - dirtyc1 = not base.isancestorof(_c1)
> - dirtyc2 = not base.isancestorof(_c2)
> - graft = dirtyc1 or dirtyc2
> - tca = base
> - if graft:
> - tca = _c1.ancestor(_c2)
>
> - - limit = _findlimit(repo, c1, c2) -
>
> m1 = c1.manifest()
> m2 = c2.manifest()
> mb = base.manifest()
>
> - # gather data from _checkcopies:
> - # - diverge = record all diverges in this dict
> - # - copy = record all non-divergent copies in this dict
> - # - fullcopy = record all copies in this dict
> - # - incomplete = record non-divergent partial copies here
> - # - incompletediverge = record divergent partial copies here
> - diverge = {} # divergence data is shared
> - incompletediverge = {}
> - data1 = {'copy': {},
> - 'fullcopy': {},
> - 'incomplete': {},
> - 'diverge': diverge,
> - 'incompletediverge': incompletediverge,
> - }
> - data2 = {'copy': {},
> - 'fullcopy': {},
> - 'incomplete': {},
> - 'diverge': diverge,
> - 'incompletediverge': incompletediverge,
> - }
> + copies1 = pathcopies(base, c1)
> + copies2 = pathcopies(base, c2)
> +
> + inversecopies1 = {}
> + inversecopies2 = {}
> + for dst, src in copies1.items():
> + inversecopies1.setdefault(src, []).append(dst)
> + for dst, src in copies2.items():
> + inversecopies2.setdefault(src, []).append(dst)
> +
> + copy = {}
> + diverge = {}
> + renamedelete = {}
> + allsources = set(inversecopies1) | set(inversecopies2)
> + for src in allsources:
> + dsts1 = inversecopies1.get(src)
> + dsts2 = inversecopies2.get(src)
> + if dsts1 and dsts2:
> + # copied/renamed on both sides
> + if src not in m1 and src not in m2:
> + # renamed on both sides
> + dsts1 = set(dsts1)
> + dsts2 = set(dsts2)
> + # If there's some overlap in the rename destinations, we
> + # consider it not divergent. For example, if side 1 copies 'a'
> + # to 'b' and 'c' and deletes 'a', and side 2 copies 'a' to 'c'
> + # and 'd' and deletes 'a'.
Formatting could help this comment a bit. something like
side 1: a -> b -> c
side 2: a -> c -> d
> + if dsts1 & dsts2:
> + for dst in (dsts1 & dsts2):
> + copy[dst] = src
> + else:
> + diverge[src] = sorted(dsts1 | dsts2)
> + elif src in m1 and src in m2:
> + # copied on both sides
> + dsts1 = set(dsts1)
> + dsts2 = set(dsts2)
> + for dst in (dsts1 & dsts2):
> + copy[dst] = src
> + # TODO: Handle cases where it was renamed on one side and copied
> + # on the other side
Is this TODO a regression or some ported limitation.
> + elif dsts1:
> + # copied/renamed only on side 1
> + if src not in m2:
> + # deleted on side 2
> + if src not in m1:
> + # renamed on side 1, deleted on side 2
> + renamedelete[src] = dsts1
> + elif m2[src] != mb[src]:
> + # modified on side 2
> + for dst in dsts1:
> + if dst not in m2:
> + # dst not added on side 2 (handle as regular
> + # "both created" case in manifestmerge in that case)
Can you elaborate a bit on what this case means (and how we deal with
it) especially, what happens if dst is indeed in m2 ?
> + copy[dst] = src
> + elif dsts2:
> + # copied/renamed only on side 2
> + if src not in m1:
> + # deleted on side 1
> + if src not in m2:
> + # renamed on side 2, deleted on side 1
> + renamedelete[src] = dsts2
> + elif m1[src] != mb[src]:
> + # modified on side 1
> + for dst in dsts2:
> + if dst not in m1:
> + # dst not added on side 1 (handle as regular
> + # "both created" case in manifestmerge in that case)
> + copy[dst] = src
Since this seems the very same code as the previous clause, I wonder if
we could factor it out. This would help to prevent subtle bug when we
update it. (the answer might be "no because python is slow).
> + renamedeleteset = set()
> + divergeset = set()
> + for src, dsts in diverge.items():
> + divergeset.update(dsts)
> + for src, dsts in renamedelete.items():
> + renamedeleteset.update(dsts)
small nits: Is there any reason not to use diverge.values and
renamedelete.values here ?
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D6255
To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
More information about the Mercurial-devel
mailing list