[PATCH 3 of 3] localrepo: cache types for filtered repos (issue5043)

Gregory Szorc gregory.szorc at gmail.com
Fri May 26 20:42:36 EDT 2017


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1495845557 25200
#      Fri May 26 17:39:17 2017 -0700
# Node ID 1755f6e0f866afa6bbc20b688fab2995f4bca566
# Parent  9a7f60cb77b6969608d2a9b72a2ccb31a53db82b
localrepo: cache types for filtered repos (issue5043)

For reasons I can't explain, localrepository.filtered() leaks the
dynamically created type. As far as I can tell, this is due to a
cycle between the dynamically generated type and its .__bases__
tuple. This /should/ get garbage collected once all references
to both objects are in the garbage list. However, when running
`hg convert` with garbage collection debugging enabled,
objgraph.show_backrefs() shows all references outside of gc.garbage
are in fact gone. However, these objects aren't being collected!
I spent a few hours trying to track this down and didn't have any
luck. But I'm 100% confident we're leaking the dymaically generated
type and 2 tuples of types on every revision that `hg convert`
processes.

Unable to stop the leak, the next best thing is to contain it.

This patch adds a cache of of the dynamically generated repoview/filter
types on the localrepo object. Since we only generate each type
once, we cap the amount of memory that can leak to something
reasonable.

As the inline comment explains, there is danger to caching the type
if the __class__ of the repo is updated after first call. Code to
validate this doesn't happen has been added. It is possible this may
break some extensions. If it does, I think they deserve to break,
as mutating the __class__ of a repo is dangerous and should be done
as early as possible and only in reposetup().

After this change, `hg convert` no longer leaks memory on every
revision. The process will likely grow memory usage over time due
to e.g. larger manifests. But there are no leaks.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -396,6 +396,10 @@ class localrepository(object):
         # - bookmark changes
         self.filteredrevcache = {}
 
+        # Cache of types representing filtered repos.
+        self._filteredexpectedbasetype = None
+        self._filteredrepotypes = {}
+
         # generic mapping between names and nodes
         self.names = namespaces.namespaces()
 
@@ -493,11 +497,31 @@ class localrepository(object):
 
     def filtered(self, name):
         """Return a filtered version of a repository"""
+        # The dynamically created types can easily leak. To curtail this,
+        # we cache the types so we're not always creating them. The main
+        # risk to caching is that the __class__ of the original repo
+        # changes after caching. This should never happen, as __class__
+        # adjustment should only be performed in reposetup() functions
+        # in extensions *before* any kind of operation that accesses
+        # repoviews. So we check for this explicitly and error when
+        # extensions are misbehaving.
+        unfiltered = self.unfiltered()
+
+        if self._filteredexpectedbasetype is None:
+            self._filteredexpectedbasetype = unfiltered.__class__
+        else:
+            if self._filteredexpectedbasetype != unfiltered.__class__:
+                raise error.ProgrammingError('__class__ of repo modified after '
+                                             'call to filtered()')
+
         # build a new class with the mixin and the current class
         # (possibly subclass of the repo)
-        bases = (repoview.repoview, self.unfiltered().__class__)
-        cls = type('%sfilteredrepo' % name, bases, {})
-        return cls(self, name)
+        if name not in self._filteredrepotypes:
+            bases = (repoview.repoview, unfiltered.__class__)
+            cls = type('%sfilteredrepo' % name, bases, {})
+            self._filteredrepotypes[name] = cls
+
+        return self._filteredrepotypes[name](self, name)
 
     @repofilecache('bookmarks', 'bookmarks.current')
     def _bookmarks(self):
diff --git a/mercurial/statichttprepo.py b/mercurial/statichttprepo.py
--- a/mercurial/statichttprepo.py
+++ b/mercurial/statichttprepo.py
@@ -164,6 +164,9 @@ class statichttprepository(localrepo.loc
         self.encodepats = None
         self.decodepats = None
         self._transref = None
+        # Cache of types representing filtered repos.
+        self._filteredexpectedbasetype = None
+        self._filteredrepotypes = {}
 
     def _restrictcapabilities(self, caps):
         caps = super(statichttprepository, self)._restrictcapabilities(caps)


More information about the Mercurial-devel mailing list