[PATCH] store: break up reference cycle introduced in 9cbff8a39a2a

Sune Foldager cryo at cyanite.org
Thu May 5 13:28:17 CDT 2011


On Thu, May 05, 2011 at 12:37:37 -0500, Matt Mackall wrote:
>On Thu, 2011-05-05 at 09:36 +0200, Martin Geisler wrote:
>> Adrian Buehlmann <adrian at cadifra.com> writes:
>>
>> > I think the reference cycle in 9cbff8a39a2a goes like this:
>> >
>> >       --> fncacheopener object
>> >      |        |
>> >      |    local scope of function fncachestore.__init__
>> >      |        |
>> >       --- fncache object bound to name fnc
>> >
>> > Not sure if that's really a problem though.
>> >
>> > With my patch, the references go like this:
>> >
>> >        -- fncachestore object
>> >       |       |
>> >       |   _fncacheopener object  --
>> >       |       |                    |
>> >        -> fncache object           |
>> >               |                    |
>> >           object of openertype  <--
>>
>> I think that very nice drawing should go into the commit message.
>>
>> But more generally: Python *can* garbage collect some reference cycles,
>> namely the cycles where none of the object define a destructor. This is
>> sort of described here:
>>
>>   http://docs.python.org/library/gc.html#gc.garbage
>>
>> I just wanted to mention this since it seems that people are hunting
>> reference cycles just for the sake of hunting them. If removing a cycle
>> makes the code architecture cleaner then that's perfect -- go ahead! But
>> if not, then I think we can leave a couple of cycles behind.
>
>Here's a brief rant:

Well this seems a bit one-sided...

>Why Cyclical Garbage Collection Sucks Compared to Reference Counting
>
>1: You can't have meaningful destructors, because when destruction
>happens is undefined. And going-out-of-scope destructors are extremely
>useful. Python is already a rather broken in this regard, so feel free
>to ignore this point.

GC'ed languages get along fine without them, using stuff like using
(in C#) and with (in python).. while dtors are nice at times, I find
their need overrated myself.

>2: Memory usage is defined by garbage collector policy, not program
>architecture. But 99.9% of machines run operating systems where
>individual applications have little visibility into overall system
>memory pressure, so garbage collection, which is fundamentally lazy
>about giving memory back, is hostile to real world operating systems.

Things like generations and other GC techniques are used in modern
systems to ensure pretty good performance in practice, in my observation.
Are you only thinking of Linux + some gc'ed runtime you don't like?

>And operating systems will be hostile right back. When you incur a page
>fault that requires a new page on a memory mapped region under memory
>pressure, it is _too late_ to nicely ask your process to clean up after
>itself. The only real option is to kill your process.

And how often does that happen?... again generational collectors. I think
I had a kernel kill my process once (on linux).

>Various people (like some of the PyPy devs) think that this situation
>can be fixed by making kernels (for multiple OSes!) garbage-collector
>aware through various crazy schemes. As someone who's rather intimate
>with the Linux kernel's memory management[1], I'm fairly sure that's Not
>Going To Happen.

At least you're not biased ("crazy schemes") ;-). I think, on Windows,
the CLR is tight with their kernel a bit like that. Maybe crazy, but
most likely efficient. I don't know the details, but at least CLR
subscribes to some memory pressure events.

>3: Garbage collection is fundamentally hostile to performance on modern
>CPU architectures. It's all about following long chains of cache-cold
>non-local references, and that's about the worst thing you can possibly
>do with a CPU cache. Compare this with reference counters: they're
>typically both hot and local. Yes, they have overhead, and they may even
>have more overhead when you just count number of operations. But what
>actually matters is cycle count. And when you're chasing down references
>outside your working set, a single-cycle pointer indirection can turn
>into a hundred cycles of waiting on main memory.

But if this were unconditionally true, why can you find studies finding
that GCs can in many cases perform better than ref-counting schemes?

>4: Garbage collection is fundamentally hostile to latency. Because
>garbage collection is lazy and all about making a big batch of work for
>later, and when later occurs is up to the garbage collector, any
>operation can potentially be interrupted for millions or billions of CPU
>cycles. While some modern GCs do a decent job of bounding that towards
>the lower end of that scale, it's still an eternity in the real-time
>world.

Generational collectors alliviate this a lot, I think.

>5: On the upside, it can collect cycles, if your program should be so
>foolish as to make them.

Hehe, I really think you come off as pretty biased... since when is it
considered "foolish" to create cycles of reference?? In my oppinon ref
counting kinda sucks because you need to be carful not to create
cycles... :p  a matter of taste I guess.


More information about the Mercurial-devel mailing list