[PATCH] nogc: do disable gc for CPython

Gregory Szorc gregory.szorc at gmail.com
Wed May 24 13:49:06 EDT 2017


On Wed, May 24, 2017 at 9:11 AM, Yuya Nishihara <yuya at tcha.org> wrote:

> On Wed, 24 May 2017 08:52:24 -0700, Gregory Szorc wrote:
> > On Wed, May 24, 2017 at 8:30 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> >
> > > On Tue, 23 May 2017 18:03:53 -0700, Gregory Szorc wrote:
> > > > On Tue, May 23, 2017 at 1:20 PM, Jun Wu <quark at fb.com> wrote:
> > > > > # HG changeset patch
> > > > > # User Jun Wu <quark at fb.com>
> > > > > # Date 1495570793 25200
> > > > > #      Tue May 23 13:19:53 2017 -0700
> > > > > # Node ID 66dc3a3b4631507fed3ea7fd2fa3e3356ea820fe
> > > > > # Parent  34e9b8b94f66db7ebe366f67cea7b64bd0ec6968
> > > > > # Available At https://bitbucket.org/quark-zju/hg-draft
> > > > > #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> > > > > 66dc3a3b4631
> > > > > nogc: do disable gc for CPython
> > > > >
> > > > > 279cd8 makes util.nogc a no-op for Python >= 2.7. That's actually
> > > hurting
> > > > > performance. For example, _fm1readmarkers takes 70ms with gc
> disabled,
> > > > > 190ms
> > > > > with gc enabled.
> > > > >
> > > > > This patch makes util.nogc effective for CPython and only a no-op
> for
> > > PyPy.
> > > > >
> > > > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > > > --- a/mercurial/util.py
> > > > > +++ b/mercurial/util.py
> > > > > @@ -910,9 +910,5 @@ def nogc(func):
> > > > >      into. As a workaround, disable GC while building complex
> (huge)
> > > > >      containers.
> > > > > -
> > > > > -    This garbage collector issue have been fixed in 2.7.
> > > > >      """
> > > > > -    if sys.version_info >= (2, 7):
> > > > > -        return func
> > > > >      def wrapper(*args, **kwargs):
> > > > >          gcenabled = gc.isenabled()
> > > > > @@ -925,4 +921,9 @@ def nogc(func):
> > > > >      return wrapper
> > > > >
> > > > > +if pyplatform.python_implementation() != 'CPython':
> > > > > +    # PyPy becomes slower with gc disabled
> > > > >
> > > >
> > > > Since this patch sounds like it has more to do with PyPy, then
> > > > implementation should check for =pypy, not !=cpython. Other than
> that,
> > > I'm
> > > > fine keeping the GC in place on PyPy, as its JIT should mitigate
> impact
> > > > from garbage spewing code.
> > > >
> > > > Also, regarding this whole no gc concept, I'll argue that any
> operation
> > > > needing to disable GC should be implemented in C in such a way that
> it
> > > > isn't instantiating thousands of CPython objects.
> > >
> > > Last time I tested fm1readmarkers() of C extension, I didn't see
> noticeable
> > > difference with/without GC (though I just checked wall clock.) So I
> suspect
> > > the perf result depends on system, size of obsstore, etc.
> > >
> > > > So many of our C routines
> > > > are relatively slow (revlog index, obsmarkers, dag walking, etc)
> because
> > > > they operate on PyObject instead of raw C data structures. If we
> remove
> > > > PyObject from critical path code, we eliminate most GC concerns and
> > > likely
> > > > make execution at *least* 5x faster.
> > >
> > > That's true for obsmarkers, but IIRC revlog index doesn't create a
> boxed
> > > object until it is accessed.
> >
> > That is true. And it is smart enough to reuse the PyObject on subsequent
> > access. However, many things realize PyObject when they don't need to.
> For
> > example, when calling revlog.revision(), one has to determine the delta
> > chain and start and end offsets for each revision. This requires
> > constructing the index entry tuples (which consist of 9 PyObject - the
> > tuple and its 8 members). For a delta chain of 10,000 (not uncommon for
> > manifests on large repos), you've created 90,000+ PyObjects!
>
> Got it, thanks. Maybe the first example could be mitigated by using a
> dedicated
> type backed by C struct (like dirstate tuple), but it would just make the
> number
> of PyObjects to x1/9.
>

Agreed. I want to convert the index type from a tuple to at least a named
tuple. At the point you have a custom type, it isn't that much work to
instantiate the PyObject members lazily.


>
> > A better
> > solution is to implement these critical functions in !Python and/or pass
> > around arrays of "packed" data so we only realize the Python objects on
> > demand.
>
> (I read !Python = Rust or anything else that isn't C of CPython. I don't
> wanna fix bugs of ref-counting. :)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170524/4df06a1c/attachment.html>


More information about the Mercurial-devel mailing list