D7069: copies: simplify the handling of merges

marmoute (Pierre-Yves David) phabricator at mercurial-scm.org
Sat Oct 12 16:46:16 UTC 2019


marmoute created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Instead of stacking copies for both parent on the head, we move copies outside
  of the heap into a dedicated dictionary. The two side of merge can we merged
  sooner, making the algorithm simpler.
  
  This simplicity reflect in the heap structure and speed up the execution for
  copies involving a large amount of merges.
  
  Here are timing for perfpathcopies of multiple revision pairs.
  
  - filelog: timing using filelog (with the introrev condition dropped)
  - base:    this series base
  - before:  the parent of this changeset
  - after:   this changeset
  
  revision: large amount; added files: large amount; rename small amount; c3b14617fbd7 9ba6ab77fd29
  filelog: ! wall 3.679613 comb 3.680000 user 3.580000 sys 0.100000 (median of 3)
  base:    ! wall 8.884369 comb 8.880000 user 8.850000 sys 0.030000 (median of 3)
  before:  ! wall 8.443747 comb 8.420000 user 8.410000 sys 0.010000 (median of 3)
  after:   ! wall 4.697917 comb 4.690000 user 4.660000 sys 0.030000 (median of 3)
  revision: large amount; added files: small amount; rename small amount; c3b14617fbd7 f650a9b140d2
  filelog: ! wall 0.003357 comb 0.010000 user 0.010000 sys 0.000000 (median of 781)
  base:    ! wall 12.398524 comb 12.400000 user 12.330000 sys 0.070000 (median of 3)
  before:  ! wall 10.852593 comb 10.850000 user 10.800000 sys 0.050000 (median of 3)
  after:   ! wall 6.750832 comb 6.750000 user 6.640000 sys 0.110000 (median of 3)
  revision: large amount; added files: large amount; rename large amount; 08ea3258278e d9fa043f30c0
  filelog: ! wall 2.754687 comb 2.760000 user 2.650000 sys 0.110000 (median of 4)
  base:    ! wall 1.423166 comb 1.420000 user 1.400000 sys 0.020000 (median of 8)
  before:  ! wall 1.068041 comb 1.060000 user 1.050000 sys 0.010000 (median of 10)
  after:   ! wall 1.045916 comb 1.050000 user 1.040000 sys 0.010000 (median of 10)
  revision: small amount; added files: large amount; rename large amount; df6f7a526b60 a83dc6a2d56f
  filelog: ! wall 1.552293 comb 1.550000 user 1.510000 sys 0.040000 (median of 6
  base:    ! wall 0.022662 comb 0.020000 user 0.020000 sys 0.000000 (median of 128)
  before:  ! wall 0.021111 comb 0.020000 user 0.020000 sys 0.000000 (median of 139)
  after:   ! wall 0.021577 comb 0.020000 user 0.020000 sys 0.000000 (median of 138)
  revision: small amount; added files: large amount; rename small amount; 4aa4e1f8e19a 169138063d63
  filelog: ! wall 1.500983 comb 1.500000 user 1.420000 sys 0.080000 (median of 7)
  base:    ! wall 0.006956 comb 0.010000 user 0.010000 sys 0.000000 (median of 392)
  before:  ! wall 0.004356 comb 0.010000 user 0.010000 sys 0.000000 (median of 675)
  after:   ! wall 0.004329 comb 0.000000 user 0.000000 sys 0.000000 (median of 682)
  revision: small amount; added files: small amount; rename small amount; 4bc173b045a6 964879152e2e
  filelog: ! wall 0.011745 comb 0.020000 user 0.020000 sys 0.000000 (median of 250)
  base:    ! wall 0.000156 comb 0.000000 user 0.000000 sys 0.000000 (median of 17180)
  before:  ! wall 0.000100 comb 0.000000 user 0.000000 sys 0.000000 (median of 26912)
  after:   ! wall 0.000105 comb 0.000000 user 0.000000 sys 0.000000 (median of 25689)
  revision: medium amount; added files: large amount; rename medium amount; c95f1ced15f2 2c68e87c3efe
  filelog: ! wall 3.228230 comb 3.230000 user 3.110000 sys 0.120000 (median of 4)
  base:    ! wall 0.997640 comb 1.000000 user 0.980000 sys 0.020000 (median of 10)
  before:  ! wall 0.778291 comb 0.780000 user 0.780000 sys 0.000000 (median of 13)
  after:   ! wall 0.706594 comb 0.710000 user 0.710000 sys 0.000000 (median of 15)
  revision: medium amount; added files: medium amount; rename small amount; d343da0c55a8 d7746d32bf9d
  filelog: ! wall 1.052501 comb 1.060000 user 1.040000 sys 0.020000 (median of 10
  base:    ! wall 0.214519 comb 0.220000 user 0.220000 sys 0.000000 (median of 45)
  before:  ! wall 0.160804 comb 0.160000 user 0.160000 sys 0.000000 (median of 62)
  after:   ! wall 0.163736 comb 0.160000 user 0.160000 sys 0.000000 (median of 60)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7069

AFFECTED FILES
  mercurial/copies.py

CHANGE DETAILS

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -282,30 +282,13 @@
                 children[p].append(r)
 
     roots = set(children) - set(missingrevs)
-    # 'work' contains 3-tuples of a (revision number, parent number, copies).
-    # The parent number is only used for knowing which parent the copies dict
-    # came from.
-    # NOTE: To reduce costly copying the 'copies' dicts, we reuse the same
-    # instance for *one* of the child nodes (the last one). Once an instance
-    # has been put on the queue, it is thus no longer safe to modify it.
-    # Conversely, it *is* safe to modify an instance popped off the queue.
-    work = [(r, 1, {}) for r in roots]
+    work = [r for r in roots]
+    all_copies = dict((r, {}) for r in roots)
     heapq.heapify(work)
     alwaysmatch = match.always()
     while work:
-        r, i1, copies = heapq.heappop(work)
-        if work and work[0][0] == r:
-            # We are tracing copies from both parents
-            r, i2, copies2 = heapq.heappop(work)
-            for dst, src in copies2.items():
-                # Unlike when copies are stored in the filelog, we consider
-                # it a copy even if the destination already existed on the
-                # other branch. It's simply too expensive to check if the
-                # file existed in the manifest.
-                if dst not in copies:
-                    # If it was copied on the p1 side, leave it as copied from
-                    # that side, even if it was also copied on the p2 side.
-                    copies[dst] = copies2[dst]
+        r = heapq.heappop(work)
+        copies = all_copies.pop(r)
         if r == b.rev():
             return copies
         for i, c in enumerate(children[r]):
@@ -331,7 +314,28 @@
             for f in childctx.filesremoved():
                 if f in newcopies:
                     del newcopies[f]
-            heapq.heappush(work, (c, parent, newcopies))
+            othercopies = all_copies.get(c)
+            if othercopies is None:
+                heapq.heappush(work, c)
+                all_copies[c] = newcopies
+            else:
+                # we are the second parent to work on c, we need to merge our
+                # work with the other.
+                #
+                # Unlike when copies are stored in the filelog, we consider
+                # it a copy even if the destination already existed on the
+                # other branch. It's simply too expensive to check if the
+                # file existed in the manifest.
+                #
+                # In case of conflict, parent 1 take precedence over parent 2.
+                # This is an arbitrary choice made anew when implementing
+                # changeset based copies. It was made without regards with
+                # potential filelog related behavior.
+                if parent == 1:
+                    othercopies.update(newcopies)
+                else:
+                    newcopies.update(othercopies)
+                    all_copies[c] = newcopies
     assert False
 
 



To: marmoute, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list