[PATCH] store: break up reference cycle introduced in 9cbff8a39a2a
mpm at selenic.com
Thu May 5 12:37:37 CDT 2011
On Thu, 2011-05-05 at 09:36 +0200, Martin Geisler wrote:
> Adrian Buehlmann <adrian at cadifra.com> writes:
> > I think the reference cycle in 9cbff8a39a2a goes like this:
> > --> fncacheopener object
> > | |
> > | local scope of function fncachestore.__init__
> > | |
> > --- fncache object bound to name fnc
> > Not sure if that's really a problem though.
> > With my patch, the references go like this:
> > -- fncachestore object
> > | |
> > | _fncacheopener object --
> > | | |
> > -> fncache object |
> > | |
> > object of openertype <--
> I think that very nice drawing should go into the commit message.
> But more generally: Python *can* garbage collect some reference cycles,
> namely the cycles where none of the object define a destructor. This is
> sort of described here:
> I just wanted to mention this since it seems that people are hunting
> reference cycles just for the sake of hunting them. If removing a cycle
> makes the code architecture cleaner then that's perfect -- go ahead! But
> if not, then I think we can leave a couple of cycles behind.
Here's a brief rant:
Why Cyclical Garbage Collection Sucks Compared to Reference Counting
1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.
2: Memory usage is defined by garbage collector policy, not program
architecture. But 99.9% of machines run operating systems where
individual applications have little visibility into overall system
memory pressure, so garbage collection, which is fundamentally lazy
about giving memory back, is hostile to real world operating systems.
And operating systems will be hostile right back. When you incur a page
fault that requires a new page on a memory mapped region under memory
pressure, it is _too late_ to nicely ask your process to clean up after
itself. The only real option is to kill your process.
Various people (like some of the PyPy devs) think that this situation
can be fixed by making kernels (for multiple OSes!) garbage-collector
aware through various crazy schemes. As someone who's rather intimate
with the Linux kernel's memory management, I'm fairly sure that's Not
Going To Happen.
3: Garbage collection is fundamentally hostile to performance on modern
CPU architectures. It's all about following long chains of cache-cold
non-local references, and that's about the worst thing you can possibly
do with a CPU cache. Compare this with reference counters: they're
typically both hot and local. Yes, they have overhead, and they may even
have more overhead when you just count number of operations. But what
actually matters is cycle count. And when you're chasing down references
outside your working set, a single-cycle pointer indirection can turn
into a hundred cycles of waiting on main memory.
4: Garbage collection is fundamentally hostile to latency. Because
garbage collection is lazy and all about making a big batch of work for
later, and when later occurs is up to the garbage collector, any
operation can potentially be interrupted for millions or billions of CPU
cycles. While some modern GCs do a decent job of bounding that towards
the lower end of that scale, it's still an eternity in the real-time
5: On the upside, it can collect cycles, if your program should be so
foolish as to make them.
CPython mostly gets this right by only using lazy garbage collection as
a last resort. PyPy and friends regress here.
In terms of Mercurial architecture, we mostly care about point 2 - we
don't want to hold onto (potentially very large) blocks of memory longer
than we have to.
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel