[PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

Durham Goode durham at fb.com
Mon May 8 14:40:03 EDT 2017


# HG changeset patch
# User Durham Goode <durham at fb.com>
# Date 1494268523 25200
#      Mon May 08 11:35:23 2017 -0700
# Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
# Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
strip: make tree stripping O(changes) instead of O(repo)

The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.

I don't have actual perf numbers, since internally we don't use a true
treemanifest, but simply iterating over hundreds of thousands of revlogs takes
many, many seconds, so this should help tremendously when stripping only a few
commits.

diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
 def striptrees(repo, tr, striprev, files):
     if 'treemanifest' in repo.requirements: # safe but unnecessary
                                             # otherwise
-        for unencoded, encoded, size in repo.store.datafiles():
-            if (unencoded.startswith('meta/') and
-                unencoded.endswith('00manifest.i')):
-                dir = unencoded[5:-12]
-                repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
+        treerevlog = repo.manifestlog._revlog
+        for dir in util.dirs(files):
+            # If the revlog doesn't exist, this returns an empty revlog and is a
+            # no-op.
+            rl = treerevlog.dirlog(dir)
+            rl.strip(striprev, tr)
 
 def rebuildfncache(ui, repo):
     """Rebuilds the fncache file from repo history.


More information about the Mercurial-devel mailing list