[PATCH 2 of 2] keyword: eliminate potential reference cycles from kwrepo

Matt Mackall mpm at selenic.com
Tue Jul 7 12:14:35 CDT 2009


On Tue, 2009-07-07 at 17:59 +0200, Martin Geisler wrote:
> Christian Ebert <blacktrash at gmx.net> writes:
> 
> Hi Christian
> 
> > [...
> >
> > ATM there is a mixture of self.repomemberfunc and repo.repomemberfunc,
> > and when I tried to clean this up Matt chimed in like so:
> >
> > http://marc.info/?l=mercurial-devel&m=124654567201924&w=2
> 
> Ah, I had forgotten about that mail...
> 
> >>>         def kwcommitctx(self, ctx, error=False):
> >>>             wlock = lock = None
> >>>             try:
> >>> -                wlock = self.wlock()
> >>> -                lock = self.lock()
> >>> +                wlock = super(kwrepo, self).wlock()
> >>> +                lock = super(kwrepo, self).lock()
> >
> > Third working alternative:
> >                  wlock = repo.wlock()
> >                  lock = repo.wlock()
> >
> > Basically I thought it would be good to use 1 of the 3 alternatives
> > throughout.
> >
> >> Why is this necessary? The kwrepo class does not override the wlock
> >> and lock methods, so I don't see why you would explicitly call the
> >> superclass' methods?
> >
> > I'm not sure if this is necessary either. My layman's understanding
> > is: it doesn't break, it works, fine ;-) However, this attitude is
> > difficult to reconcile with my desire for consistency. And after
> > Matt's reply I /believe/ calling the superclass method explicitly is
> > safer.
> 
> I think he was talking about a different situation. He talked about
> doing
> 
>   self.foo = self.bar
> 
> which apparently creates a reference cycle between self and itself(!)...
> I don't know the finer details of Python reference counting, but it
> surpricing that such an assignment should be dangerous.

Now that I'm not replying to email on a phone, I'll try to be a bit more
expansive. There are two different problems.

The first is the:

 self.foo = self.bar

issue. self.bar here isn't simply a "pointer to function bar in
self.__class__". That's what's called an unbound method pointer. Instead
it's a bound method point: it remembers that it's the bar method on the
object self. Consider:

  a = foo.bar
  a()  # equivalent to foo.bar()

  foo2.bar = foo.bar
  foo2.bar() # STILL equivalent to foo.bar()

Bound methods hold references to the object in question, so when you
assign a bound method to an object member, there's now a reference cycle
and the object won't get destroyed when it goes out of scope. This is a
problem for bundlerepo and a couple other classes which use __del__ to
clean up.

How do we work around this? Good question. In the case of kwrepo, where
we're defining a new class already, the easiest thing to do is simply to
superclass:

class baz(foo):
  def bar(self):
    # add our baz-specific behavior
    ...
    # fall through to base behavior where appropriate
    super(baz, self).bar()

Then no reassignment is necessary and we never need to deal with bound
methods.

You might ask: is there such a thing as an unbound method? Couldn't we
make one of those and then assign it? We can get unbound methods by
poking around in an object's class, but simply assigning them to an
object doesn't get us the desired mechanics: self isn't automatically
passed to a method that's found in an object's dictionary (otherwise our
original self.bar = self.foo wouldn't work!). It's only passed when
we're looking that method up in our class definition.

The second issue is repo.foo vs self.foo. If you have something that
looks like:

def wrapobject(repo):
  class superrepo(repo.__class__):
    def foo(self):
      repo.val += 1
  repo.__class__ == superrepo

..that repo.val pins a reference to repo from the surrounding scope
inside the class definition. When we slap it onto the repo object, repo
now contains a reference loop again. Changing that repo.val to self.val
cures that.

-- 
http://selenic.com : development and support for Mercurial and Linux




More information about the Mercurial-devel mailing list