[PATCH V2] localrepo: cache types for filtered repos (issue5043)

Gregory Szorc gregory.szorc at gmail.com
Mon Jul 3 11:30:10 EDT 2017



> On Jul 3, 2017, at 06:03, Yuya Nishihara <yuya at tcha.org> wrote:
> 
>> On Sat, 01 Jul 2017 20:52:12 -0700, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1498967479 25200
>> #      Sat Jul 01 20:51:19 2017 -0700
>> # Node ID 79e1b61f921b81b59d791c26a92120875020b2b3
>> # Parent  6d678ab1b10d0fddc73003d21aa3c7ec43194e2e
>> localrepo: cache types for filtered repos (issue5043)
> 
> [...]
> 
>> This patch adds a cache of of the dynamically generated repoview/filter
>> types on the localrepo object. Since we only generate each type
>> once, we cap the amount of memory that can leak to something
>> reasonable.
>> 
>> After this change, `hg convert` no longer leaks memory on every
>> revision. The process will likely grow memory usage over time due
>> to e.g. larger manifests. But there are no leaks.
>> 
>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>> --- a/mercurial/localrepo.py
>> +++ b/mercurial/localrepo.py
>> @@ -400,6 +400,9 @@ class localrepository(object):
>>         # post-dirstate-status hooks
>>         self._postdsstatus = []
>> 
>> +        # Cache of types representing filtered repos.
>> +        self._filteredrepotypes = weakref.WeakKeyDictionary()
>> +
>>         # generic mapping between names and nodes
>>         self.names = namespaces.namespaces()
>> 
>> @@ -501,11 +504,21 @@ class localrepository(object):
>> 
>>     def filtered(self, name):
>>         """Return a filtered version of a repository"""
>> -        # build a new class with the mixin and the current class
>> -        # (possibly subclass of the repo)
>> -        class filteredrepo(repoview.repoview, self.unfiltered().__class__):
>> -            pass
>> -        return filteredrepo(self, name)
>> +        # Python <3.4 easily leaks types via __mro__. See
>> +        # https://bugs.python.org/issue17950. We cache dynamically
>> +        # created types so this method doesn't leak on every
>> +        # invocation.
>> +
>> +        key = self.unfiltered().__class__
>> +        if key not in self._filteredrepotypes:
>> +            # Build a new type with the repoview mixin and the base
>> +            # class of this repo. Give it a name containing the
>> +            # filter name to aid debugging.
>> +            bases = (repoview.repoview, key)
>> +            cls = type('%sfilteredrepo' % name, bases, {})
>> +            self._filteredrepotypes[key] = cls
> 
> I think this may still leak a proxy class in hgweb-type applications. Do
> we need a type object per repository instance?

I don't follow. Where's the leak?

If you are proposing a module-level cache, I *think* that will work. As long as weak refs are used, cache entries should go away when the last underlying repo referencing them does.




More information about the Mercurial-devel mailing list