[PATCH V2] localrepo: cache types for filtered repos (issue5043)

Gregory Szorc gregory.szorc at gmail.com
Sun Jul 2 03:52:12 UTC 2017


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1498967479 25200
#      Sat Jul 01 20:51:19 2017 -0700
# Node ID 79e1b61f921b81b59d791c26a92120875020b2b3
# Parent  6d678ab1b10d0fddc73003d21aa3c7ec43194e2e
localrepo: cache types for filtered repos (issue5043)

Python introduces a reference cycle on dynamically created types
via __mro__, making them very easy to leak. See
https://bugs.python.org/issue17950.

Previously, repo.filtered() created a type on every invocation.
Long-running processes (like `hg convert`) could call this
function thousands of times, leading to a steady memory leak.

Since we're Unable to stop the leak because this is a bug in
Python, the next best thing is to contain it.

This patch adds a cache of of the dynamically generated repoview/filter
types on the localrepo object. Since we only generate each type
once, we cap the amount of memory that can leak to something
reasonable.

After this change, `hg convert` no longer leaks memory on every
revision. The process will likely grow memory usage over time due
to e.g. larger manifests. But there are no leaks.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -400,6 +400,9 @@ class localrepository(object):
         # post-dirstate-status hooks
         self._postdsstatus = []
 
+        # Cache of types representing filtered repos.
+        self._filteredrepotypes = weakref.WeakKeyDictionary()
+
         # generic mapping between names and nodes
         self.names = namespaces.namespaces()
 
@@ -501,11 +504,21 @@ class localrepository(object):
 
     def filtered(self, name):
         """Return a filtered version of a repository"""
-        # build a new class with the mixin and the current class
-        # (possibly subclass of the repo)
-        class filteredrepo(repoview.repoview, self.unfiltered().__class__):
-            pass
-        return filteredrepo(self, name)
+        # Python <3.4 easily leaks types via __mro__. See
+        # https://bugs.python.org/issue17950. We cache dynamically
+        # created types so this method doesn't leak on every
+        # invocation.
+
+        key = self.unfiltered().__class__
+        if key not in self._filteredrepotypes:
+            # Build a new type with the repoview mixin and the base
+            # class of this repo. Give it a name containing the
+            # filter name to aid debugging.
+            bases = (repoview.repoview, key)
+            cls = type('%sfilteredrepo' % name, bases, {})
+            self._filteredrepotypes[key] = cls
+
+        return self._filteredrepotypes[key](self, name)
 
     @repofilecache('bookmarks', 'bookmarks.current')
     def _bookmarks(self):
diff --git a/mercurial/statichttprepo.py b/mercurial/statichttprepo.py
--- a/mercurial/statichttprepo.py
+++ b/mercurial/statichttprepo.py
@@ -165,6 +165,8 @@ class statichttprepository(localrepo.loc
         self.encodepats = None
         self.decodepats = None
         self._transref = None
+        # Cache of types representing filtered repos.
+        self._filteredrepotypes = {}
 
     def _restrictcapabilities(self, caps):
         caps = super(statichttprepository, self)._restrictcapabilities(caps)


More information about the Mercurial-devel mailing list