[PATCH] nogc: do disable gc for CPython

Gregory Szorc gregory.szorc at gmail.com
Wed May 24 11:52:24 EDT 2017


On Wed, May 24, 2017 at 8:30 AM, Yuya Nishihara <yuya at tcha.org> wrote:

> On Tue, 23 May 2017 18:03:53 -0700, Gregory Szorc wrote:
> > On Tue, May 23, 2017 at 1:20 PM, Jun Wu <quark at fb.com> wrote:
> > > # HG changeset patch
> > > # User Jun Wu <quark at fb.com>
> > > # Date 1495570793 25200
> > > #      Tue May 23 13:19:53 2017 -0700
> > > # Node ID 66dc3a3b4631507fed3ea7fd2fa3e3356ea820fe
> > > # Parent  34e9b8b94f66db7ebe366f67cea7b64bd0ec6968
> > > # Available At https://bitbucket.org/quark-zju/hg-draft
> > > #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> > > 66dc3a3b4631
> > > nogc: do disable gc for CPython
> > >
> > > 279cd8 makes util.nogc a no-op for Python >= 2.7. That's actually
> hurting
> > > performance. For example, _fm1readmarkers takes 70ms with gc disabled,
> > > 190ms
> > > with gc enabled.
> > >
> > > This patch makes util.nogc effective for CPython and only a no-op for
> PyPy.
> > >
> > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > --- a/mercurial/util.py
> > > +++ b/mercurial/util.py
> > > @@ -910,9 +910,5 @@ def nogc(func):
> > >      into. As a workaround, disable GC while building complex (huge)
> > >      containers.
> > > -
> > > -    This garbage collector issue have been fixed in 2.7.
> > >      """
> > > -    if sys.version_info >= (2, 7):
> > > -        return func
> > >      def wrapper(*args, **kwargs):
> > >          gcenabled = gc.isenabled()
> > > @@ -925,4 +921,9 @@ def nogc(func):
> > >      return wrapper
> > >
> > > +if pyplatform.python_implementation() != 'CPython':
> > > +    # PyPy becomes slower with gc disabled
> > >
> >
> > Since this patch sounds like it has more to do with PyPy, then
> > implementation should check for =pypy, not !=cpython. Other than that,
> I'm
> > fine keeping the GC in place on PyPy, as its JIT should mitigate impact
> > from garbage spewing code.
> >
> > Also, regarding this whole no gc concept, I'll argue that any operation
> > needing to disable GC should be implemented in C in such a way that it
> > isn't instantiating thousands of CPython objects.
>
> Last time I tested fm1readmarkers() of C extension, I didn't see noticeable
> difference with/without GC (though I just checked wall clock.) So I suspect
> the perf result depends on system, size of obsstore, etc.
>
> > So many of our C routines
> > are relatively slow (revlog index, obsmarkers, dag walking, etc) because
> > they operate on PyObject instead of raw C data structures. If we remove
> > PyObject from critical path code, we eliminate most GC concerns and
> likely
> > make execution at *least* 5x faster.
>
> That's true for obsmarkers, but IIRC revlog index doesn't create a boxed
> object until it is accessed.
>

That is true. And it is smart enough to reuse the PyObject on subsequent
access. However, many things realize PyObject when they don't need to. For
example, when calling revlog.revision(), one has to determine the delta
chain and start and end offsets for each revision. This requires
constructing the index entry tuples (which consist of 9 PyObject - the
tuple and its 8 members). For a delta chain of 10,000 (not uncommon for
manifests on large repos), you've created 90,000+ PyObjects! A better
solution is to implement these critical functions in !Python and/or pass
around arrays of "packed" data so we only realize the Python objects on
demand.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170524/058d5714/attachment.html>


More information about the Mercurial-devel mailing list