[PATCH rfc] rfc: call gc at exit of mercurial

Gregory Szorc gregory.szorc at gmail.com
Tue May 31 22:32:36 EDT 2016


On Mon, May 30, 2016 at 12:38 AM, Maciej Fijalkowski <fijall at gmail.com>
wrote:

> On Wed, May 4, 2016 at 8:51 AM, Gregory Szorc <gregory.szorc at gmail.com>
> wrote:
> >
> >
> >> On Apr 4, 2016, at 23:37, Maciej Fijalkowski <fijall at gmail.com> wrote:
> >>
> >> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
> >> <pierre-yves.david at ens-lyon.org> wrote:
> >>>
> >>>
> >>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
> >>>>
> >>>> class A(object):
> >>>>     def __del__(self):
> >>>>         print "del"
> >>>>
> >>>> class B(object):
> >>>>     pass
> >>>>
> >>>> b = B()
> >>>> b.b = b
> >>>> b.a = A()
> >>>>
> >>>>
> >>>> This example does not call __del__ in CPython either.
> >>>>
> >>>> The __del__ is not guaranteed to be called - that's why there is a
> >>>> painful module finalization procedure where CPython is trying to call
> >>>> "as much as possible", but there are still no guarantees. If you add
> >>>> del b; gc.collect() you will see "del" printed. Of course this
> >>>> involves a cycle, but cycles can come in ways that you don't expect
> >>>> them and PyPy simply says "everything is GCed". I think it's very much
> >>>> in line with what python-dev thinks.
> >>>
> >>>
> >>> Which is why we have __del__ in very few object and we deploy massive
> effort
> >>> to ensure their don't get caught in cycle and mostly succeeding at
> this.
> >>> (Kind of the same we put a lot of effort into making sure __del__ are
> never
> >>> really called but keep them as double safety).
> >>>
> >>> So in the case we care about (no cycle) Cpython would call our __del__,
> >>> right?
> >>>
> >>> --
> >>> Pierre-Yves David
> >>
> >> Yes, but I would argue you can create cycles without knowing. E.g.
> >>
> >> def f():
> >>    try:
> >>       some_stuff
> >>    except:
> >>       x = sys.exc_info()
> >>
> >> creates a cycle. There are also ways to create cycles with passing
> >> global functions around etc.
> >
> > This.
> >
> > We have plenty of cycles in our code. We just don't notice them very
> often because "hg" processes are short-lived. And what's worse is we don't
> know we're introducing them unless we go looking for them, often after
> someone complains about a leak on a large repo.
> >
> > If you want to create cycles and leak memory, I recommend "hg convert"
> on thousands of revisions with extensions and hooks installed. Or start a
> WSGI server.
> >
> > One of the reasons I want to get Python 3 support is so we can use its
> tracemalloc module to help debug leaks. The Python 2 tools for finding
> cycles and leaks (such as guppy and heappy) are a bit harder to use and to
> integrate into our testing harness.
>
> I thought the consensus was that yes, there are cycles, no, they're
> not a problem because we need to close files/resources with context
> managers anyway. Why you would want to "debug" cycles?
>

Cycles and files/resources are separate things. But both involves leaks:
you can leak file descriptors and sockets by not using context managers or
"finally" blocks and you can leak memory by introducing cycles.

You would want to "debug" cycles to figure out why your long-running
Mercurial process is leaking memory and OOMing. I've had to deal with
Mercurial leaking memory in both the WSGI server and in conversion tools,
like `hg convert` and hg-git. A well-behaved application should not leak
memory nor resources.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160531/a7a00b2d/attachment.html>


More information about the Mercurial-devel mailing list