[PATCH 4 of 6] revbranchcache: populate cache incrementally

Durham Goode durham at fb.com
Mon Mar 16 22:36:16 CDT 2015


# HG changeset patch
# User Durham Goode <durham at fb.com>
# Date 1423627487 28800
#      Tue Feb 10 20:04:47 2015 -0800
# Node ID af07ccb20f1f8b31fdc2a4e482d4872501a98190
# Parent  8e61a4ec3f3d7d6af1ac5d31094c69173862f972
revbranchcache: populate cache incrementally

Previously the cache would populate completely the first time it was accessed.
This could take over a minute on larger repos. This patch changes it to update
incrementally.  Only values that are read will be written, and it will only
rewrite as much of the file as strictly necessary.

This adds a magic value of '\0\0\0\0' to represent an empty cache entry. The
probability of this matching an actual commit hash prefix is tiny, so it's ok if
that's always considered a cache miss. This is also BC safe since any existing
entries with '\0\0\0\0' will just be considered misses.

Perf numbers:

Mozilla-central: hg --time log -r 'branch(mobile)' -T.
Cold Cache: 14.7s -> 15.1s (3% worse)
Warm Cache: 1.6s -> 2.1s (30% worse)

Mozilla-cental: hg perfbranchmap
2s -> 2.4s (20% worse)

hg: hg log -r 'branch(stable) & branch(default)'
Cold Cache: 3.1s -> 1.9s (40% better - because the old code missed the cache on
both branch() revset iterations, so it did twice the work)
Warm Cache: 0.2 -> 0.26 (30% worse)

internal huge repo: hg --time log -r 'tip & branch(default)'
Cold Cache: 65.4s -> 0.2s (327x better)

While this change introduces minor regressions when iterating over every commit
in a branch, it massively improves the cold cache time for operations which
touch a single commit. I feel the better O() is worth it in this case.

diff --git a/mercurial/branchmap.py b/mercurial/branchmap.py
--- a/mercurial/branchmap.py
+++ b/mercurial/branchmap.py
@@ -367,11 +367,8 @@ class revbranchcache(object):
 
         # if requested rev is missing, add and populate all missing revs
         if len(self._rbcrevs) < rbcrevidx + _rbcrecsize:
-            first = len(self._rbcrevs) // _rbcrecsize
             self._rbcrevs.extend('\0' * (len(changelog) * _rbcrecsize -
                                          len(self._rbcrevs)))
-            for r in xrange(first, len(changelog)):
-                self._branchinfo(r)
 
         # fast path: extract data from cache, use it if node is matching
         reponode = changelog.node(rev)[:_rbcnodelen]
@@ -380,10 +377,17 @@ class revbranchcache(object):
         close = bool(branchidx & _rbccloseflag)
         if close:
             branchidx &= _rbcbranchidxmask
-        if cachenode == reponode:
+        if cachenode == '\0\0\0\0':
+            pass
+        elif cachenode == reponode:
             return self._names[branchidx], close
+        else:
+            # rev/node map has changed, invalidate the cache from here up
+            truncate = rbcrevidx + _rbcrecsize
+            del self._rbcrevs[truncate:]
+            self._rbcrevslen = min(self._rbcrevslen, truncate)
+
         # fall back to slow path and make sure it will be written to disk
-        self._rbcrevslen = min(self._rbcrevslen, rev)
         return self._branchinfo(rev)
 
     def _branchinfo(self, rev):
@@ -408,6 +412,7 @@ class revbranchcache(object):
         rec = array('c')
         rec.fromstring(pack(_rbcrecfmt, node, branchidx))
         self._rbcrevs[rbcrevidx:rbcrevidx + _rbcrecsize] = rec
+        self._rbcrevslen = min(self._rbcrevslen, rev)
 
     def write(self):
         """Save branch cache if it is dirty."""


More information about the Mercurial-devel mailing list