[PATCH 3 of 3] localrepo: cache types for filtered repos (issue5043)
Jun Wu
quark at fb.com
Fri May 26 23:55:17 EDT 2017
The reason of leaking is CLASS.__mro__ has a reference cycle. That could be
detected by my leak detection tool [2], or see comment at [1].
I'm very sad knowing the fact that __mro__ is an instant cycle in Python 2
world given the fact we use that pattern frequently to wrap ui or repo
objects. The best way I can think of to break the cycle without having to
use Python 3 is to write some low-level CPython code and I'm not sure
whether that will work yet.
[1]: http://bugs.python.org/msg60985
[2]: https://patchwork.mercurial-scm.org/patch/20886/
Excerpts from Gregory Szorc's message of 2017-05-26 17:42:36 -0700:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1495845557 25200
> # Fri May 26 17:39:17 2017 -0700
> # Node ID 1755f6e0f866afa6bbc20b688fab2995f4bca566
> # Parent 9a7f60cb77b6969608d2a9b72a2ccb31a53db82b
> localrepo: cache types for filtered repos (issue5043)
>
> For reasons I can't explain, localrepository.filtered() leaks the
> dynamically created type. As far as I can tell, this is due to a
> cycle between the dynamically generated type and its .__bases__
> tuple. This /should/ get garbage collected once all references
> to both objects are in the garbage list. However, when running
> `hg convert` with garbage collection debugging enabled,
> objgraph.show_backrefs() shows all references outside of gc.garbage
> are in fact gone. However, these objects aren't being collected!
> I spent a few hours trying to track this down and didn't have any
> luck. But I'm 100% confident we're leaking the dymaically generated
> type and 2 tuples of types on every revision that `hg convert`
> processes.
>
> Unable to stop the leak, the next best thing is to contain it.
>
> This patch adds a cache of of the dynamically generated repoview/filter
> types on the localrepo object. Since we only generate each type
> once, we cap the amount of memory that can leak to something
> reasonable.
>
> As the inline comment explains, there is danger to caching the type
> if the __class__ of the repo is updated after first call. Code to
> validate this doesn't happen has been added. It is possible this may
> break some extensions. If it does, I think they deserve to break,
> as mutating the __class__ of a repo is dangerous and should be done
> as early as possible and only in reposetup().
>
> After this change, `hg convert` no longer leaks memory on every
> revision. The process will likely grow memory usage over time due
> to e.g. larger manifests. But there are no leaks.
>
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -396,6 +396,10 @@ class localrepository(object):
> # - bookmark changes
> self.filteredrevcache = {}
>
> + # Cache of types representing filtered repos.
> + self._filteredexpectedbasetype = None
> + self._filteredrepotypes = {}
> +
> # generic mapping between names and nodes
> self.names = namespaces.namespaces()
>
> @@ -493,11 +497,31 @@ class localrepository(object):
>
> def filtered(self, name):
> """Return a filtered version of a repository"""
> + # The dynamically created types can easily leak. To curtail this,
> + # we cache the types so we're not always creating them. The main
> + # risk to caching is that the __class__ of the original repo
> + # changes after caching. This should never happen, as __class__
> + # adjustment should only be performed in reposetup() functions
> + # in extensions *before* any kind of operation that accesses
> + # repoviews. So we check for this explicitly and error when
> + # extensions are misbehaving.
> + unfiltered = self.unfiltered()
> +
> + if self._filteredexpectedbasetype is None:
> + self._filteredexpectedbasetype = unfiltered.__class__
> + else:
> + if self._filteredexpectedbasetype != unfiltered.__class__:
> + raise error.ProgrammingError('__class__ of repo modified after '
> + 'call to filtered()')
> +
> # build a new class with the mixin and the current class
> # (possibly subclass of the repo)
> - bases = (repoview.repoview, self.unfiltered().__class__)
> - cls = type('%sfilteredrepo' % name, bases, {})
> - return cls(self, name)
> + if name not in self._filteredrepotypes:
> + bases = (repoview.repoview, unfiltered.__class__)
> + cls = type('%sfilteredrepo' % name, bases, {})
> + self._filteredrepotypes[name] = cls
> +
> + return self._filteredrepotypes[name](self, name)
>
> @repofilecache('bookmarks', 'bookmarks.current')
> def _bookmarks(self):
> diff --git a/mercurial/statichttprepo.py b/mercurial/statichttprepo.py
> --- a/mercurial/statichttprepo.py
> +++ b/mercurial/statichttprepo.py
> @@ -164,6 +164,9 @@ class statichttprepository(localrepo.loc
> self.encodepats = None
> self.decodepats = None
> self._transref = None
> + # Cache of types representing filtered repos.
> + self._filteredexpectedbasetype = None
> + self._filteredrepotypes = {}
>
> def _restrictcapabilities(self, caps):
> caps = super(statichttprepository, self)._restrictcapabilities(caps)
More information about the Mercurial-devel
mailing list