[PATCH 5 of 6 frozen-repos] localrepo: evaluate revsets against frozen repos

Sat Nov 21 19:14:02 CST 2015

# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1448146482 28800
#      Sat Nov 21 14:54:42 2015 -0800
# Node ID ad891b362564b30ec69bc844a3fd98e73b6e032e
# Parent  888c2171adffa8340406b50aae02375f7bef50f4
localrepo: evaluate revsets against frozen repos

Previously, revsets were evaluated against a repository/changelog that
could change. This felt wrong. And, changectx lookups during revset
evaluation would result in repoview constantly performing changelog
correctness checks, adding overhead.

This patch results in some significant performance wins, especially
when changectx are involved. There are some minor regressions, but
the absolute time increase is so small that they can arguably be
ignored. A detailed analysis follows.

Running the revset benchmarks in default mode of returning integers,
we see some interesting changes:

revset #1: draft()
   plain
0) 0.000040
1) 0.000053 132%

   plain
0) 0.000233
1) 0.000236

revset #7: author(lmoscovicz)
   plain
0) 0.994968
1) 0.702156  70%

revset #8: author(mpm)
   plain
0) 0.982039
1) 0.696124  70%

revset #9: author(lmoscovicz) or author(mpm)
   plain
0) 1.944505
1) 1.372315  70%

revset #10: author(mpm) or author(lmoscovicz)
   plain
0) 1.970464
1) 1.393157  70%

revset #13: roots((tip~100::) - (tip~100::tip))
   plain
0) 0.000636
1) 0.000603  94%

revset #15: 42:68 and roots(42:tip)
   plain
0) 0.000226
1) 0.000178  78%

revset #19: draft()
   plain
0) 0.000040
1) 0.000056 140%

revset #22: (not public() - obsolete())
   plain
0) 0.000088
1) 0.000111 126%

revset #23: (_intlist('20000\x0020001')) and merge()
   plain
0) 0.000066
1) 0.000086 130%

First, the improvements. revsets with author() improved significantly.
The reason is that unlike most changesets, author() needs to obtain a
changectx to inspect the author field. And since repo.changelog lookups
are faster, that revset function became faster.

Now, the regressions. The percentages here are concerning. However, when
you look at the absolute values, we're going from e.g. 40us/call to
53us/call. Across the board, very fast revsets regressed by 10-20us.
Profiling reveals this regression is because of the creation and
instantiation of the new frozen repo class. The very act of dynamically
defining and then instantiating *any* proxy class appears to add this
overhead: even if the class's __init__ does practically nothing. While
there is a regression here, the wall values are so small that I don't
think it is concerning. It's also worth noting that if I modify
`hg perfrevset` to re-use the same frozen repo instance (instead of
instantiating a new one every benchmark loop), benchmark times improve
across the board! So, as more internal consumers start using frozen
repo consumers, there will be fewer frozen repo classes instantiated
and less of a performance penalty. Of course, a penalty of 20us is
probably not worrying about.

When we benchmark revsets obtaining changectxs, we see a more drastic
improvement:

revset #0: all()
   plain
0) 0.164450
1) 0.057012  34%

revset #1: draft()
   plain
0) 0.000131
1) 0.000091  69%

revset #2: ::tip
   plain
0) 0.202160
1) 0.094987  46%

revset #4: ::tip and draft()
   plain
0) 0.000269
1) 0.000251  93%

revset #5: 0::tip
   plain
0) 0.165030
1) 0.056363  34%

revset #7: author(lmoscovicz)
   plain
0) 0.997949
1) 0.727015  72%

revset #8: author(mpm)
   plain
0) 1.015544
1) 0.743669  73%

revset #9: author(lmoscovicz) or author(mpm)
   plain
0) 2.087182
1) 1.483129  71%

revset #10: author(mpm) or author(lmoscovicz)
   plain
0) 2.087200
1) 1.482509  71%

revset #11: tip:0
   plain
0) 0.169505
1) 0.056132  33%

revset #12: 0::
   plain
0) 0.210607
1) 0.089506  42%

revset #13: roots((tip~100::) - (tip~100::tip))
   plain
0) 0.000701
1) 0.000594  84%

revset #15: 42:68 and roots(42:tip)
   plain
0) 0.000234
1) 0.000177  75%

revset #16: ::p1(p1(tip))::
   plain
0) 0.266980
1) 0.144333  54%

revset #17: public()
   plain
0) 0.190038
1) 0.068690  36%

revset #18: :10000 and public()
   plain
0) 0.070399
1) 0.026308  37%

revset #19: draft()
   plain
0) 0.000131
1) 0.000090  68%

revset #22: (not public() - obsolete())
   plain
0) 0.000187
1) 0.000146  78%

revset #23: (_intlist('20000\x0020001')) and merge()
   plain
0) 0.000067
1) 0.000083 123%

revset #25: (20000::) - (20000)
   plain
0) 0.073760
1) 0.044524  60%

revset #26: (children(ancestor(tip~5, tip)) and ::(tip~5))::
   plain
0) 0.041947
1) 0.039813  94%

The one regression in revset #23 is likely probe overhead and normal
variation due to the extremely small times involved.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -530,26 +530,27 @@ class localrepository(object):
         The revset is specified as a string ``expr`` that may contain
         %-formatting to escape certain types. See ``revset.formatspec``.
 
         Return a revset.abstractsmartset, which is a list-like interface
         that contains integer revisions.
         '''
         expr = revset.formatspec(expr, *args)
         m = revset.match(None, expr)
-        return m(self)
+        return m(self.frozen())
 
     def set(self, expr, *args):
         '''Find revisions matching a revset and emit changectx instances.
 
         This is a convenience wrapper around ``revs()`` that iterates the
         result and is a generator of changectx instances.
         '''
-        for r in self.revs(expr, *args):
-            yield self[r]
+        frozen = self.frozen()
+        for r in frozen.revs(expr, *args):
+            yield frozen[r]
 
     def url(self):
         return 'file:' + self.root
 
     def hook(self, name, throw=False, **args):
         """Call a hook, passing this repo instance.
 
         This a convenience method to aid invoking hooks. Extensions likely