A thought for dealing with linkrevs

Matt Mackall mpm at selenic.com
Wed Sep 17 23:08:30 CDT 2014


So I've been thinking about the linkrev problem a bit again. I've
occasionally suggested we can deduce the possible positions of the
missing linkrevs from the shape of the changelog graph as compared to
the filelog graph. This turned out to be complicated so it never really
went anywhere.

I'm now tinkering with the idea of treating the linkrev as a simple
hint: "this is one possibility, and any aliases are numerically after
it". So if we're interested in following the history of file F from
changeset C, we can say:

- is F.linkrev an ancestor of C? great, done
- otherwise, let's look in that vicinity for changesets touching F

Since the usual case for caring about this is log -f or annotate, which
incrementally walk back through file history, we can imagine a
"corrector" object that knows where we're following from and can correct
related linkrevs. We can even imagine this state being automatically
shared as we traverse contexts, thus making fctx.linkrev() magically do
the right thing.

I tried hacking this up, and the below code will incrementally correct
all the linkrevs in the 2639 ancestors of mercurial/commands.py in
0.13s, including fixing up a ton of obsolete linkrevs, which seems like
a pretty good start. There are probably a bunch of ways it could be
faster/smarter.

(One can also imagine an alternate corrector that uses a persistent
cache.)

diff -r 48791c2bea1c mercurial/context.py
--- a/mercurial/context.py	Tue Sep 16 14:49:56 2014 -0500
+++ b/mercurial/context.py	Wed Sep 17 22:32:49 2014 -0500
@@ -17,6 +17,75 @@
 
 propertycache = util.propertycache
 
+class linkcorrector(object):
+    """report (an) appropriate linkrev for file revisions in ancestors of
+    ctx
+    """
+    def __init__(self, ctx, filelog):
+        self._start = ctx
+        self._filelog = filelog
+        self._linkcache = {} # filerev -> rev
+        self._visit = []
+        self._scanpos = ctx.rev() + 1
+        self._ancestors = set([ctx.rev()]) # known ancestors
+
+    def correct(self, fctx):
+        fr = fctx._filerev
+        linkcache = self._linkcache
+
+        # fastest path: already cached
+        if fr in linkcache:
+            return linkcache[fr]
+
+        # get the actual linkrev
+        flog = self._filelog
+        lr = flog.linkrev(fr)
+
+        # if linkrev is a known ancestor, return immediately
+        ancestors = self._ancestors
+        if lr in ancestors:
+            linkcache[fr] = lr
+            return lr
+
+        # find new revs visit (without unpacking)
+        # if we get lucky, we'll hit linkrev and be done
+        repo = self._start._repo
+        parentrevs = repo.changelog.parentrevs
+        vadd = self._visit.append
+        while True:
+            r = self._scanpos = self._scanpos - 1
+            if r in ancestors:
+                for p in parentrevs(r):
+                    ancestors.add(p)
+                if r == lr:
+                    # good, cache and return
+                    linkcache[fr] = lr
+                    return lr
+                # examine later
+                vadd(r)
+            if r <= lr or r == 0:
+                break
+
+        # check possible revs
+        path = fctx._path
+        fnode = fctx._filenode
+        visit = self._visit
+        while visit:
+            # these are added breadfirst, take ones from the end,
+            # hopefully closest to linkrev
+            r = visit.pop()
+            ctx = repo[r]
+            if path in ctx.files():
+                # maybe fast-pathed
+                newfnode = ctx.filenode(path)
+                if fnode == newfnode:
+                    # hit!
+                    linkcache[fr] = r
+                    return r
+                else:
+                    # miss, cache for later
+                    linkcache[flog.rev(newfnode)] = r
+
 class basectx(object):
     """A basectx object represents the common logic for its children:
     changectx: read-only context that is already present in the repo,

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list