[PATCH] nogc: do disable gc for CPython

Yuya Nishihara yuya at tcha.org
Wed May 24 12:11:19 EDT 2017


On Wed, 24 May 2017 08:52:24 -0700, Gregory Szorc wrote:
> On Wed, May 24, 2017 at 8:30 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> 
> > On Tue, 23 May 2017 18:03:53 -0700, Gregory Szorc wrote:
> > > On Tue, May 23, 2017 at 1:20 PM, Jun Wu <quark at fb.com> wrote:
> > > > # HG changeset patch
> > > > # User Jun Wu <quark at fb.com>
> > > > # Date 1495570793 25200
> > > > #      Tue May 23 13:19:53 2017 -0700
> > > > # Node ID 66dc3a3b4631507fed3ea7fd2fa3e3356ea820fe
> > > > # Parent  34e9b8b94f66db7ebe366f67cea7b64bd0ec6968
> > > > # Available At https://bitbucket.org/quark-zju/hg-draft
> > > > #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> > > > 66dc3a3b4631
> > > > nogc: do disable gc for CPython
> > > >
> > > > 279cd8 makes util.nogc a no-op for Python >= 2.7. That's actually
> > hurting
> > > > performance. For example, _fm1readmarkers takes 70ms with gc disabled,
> > > > 190ms
> > > > with gc enabled.
> > > >
> > > > This patch makes util.nogc effective for CPython and only a no-op for
> > PyPy.
> > > >
> > > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > > --- a/mercurial/util.py
> > > > +++ b/mercurial/util.py
> > > > @@ -910,9 +910,5 @@ def nogc(func):
> > > >      into. As a workaround, disable GC while building complex (huge)
> > > >      containers.
> > > > -
> > > > -    This garbage collector issue have been fixed in 2.7.
> > > >      """
> > > > -    if sys.version_info >= (2, 7):
> > > > -        return func
> > > >      def wrapper(*args, **kwargs):
> > > >          gcenabled = gc.isenabled()
> > > > @@ -925,4 +921,9 @@ def nogc(func):
> > > >      return wrapper
> > > >
> > > > +if pyplatform.python_implementation() != 'CPython':
> > > > +    # PyPy becomes slower with gc disabled
> > > >
> > >
> > > Since this patch sounds like it has more to do with PyPy, then
> > > implementation should check for =pypy, not !=cpython. Other than that,
> > I'm
> > > fine keeping the GC in place on PyPy, as its JIT should mitigate impact
> > > from garbage spewing code.
> > >
> > > Also, regarding this whole no gc concept, I'll argue that any operation
> > > needing to disable GC should be implemented in C in such a way that it
> > > isn't instantiating thousands of CPython objects.
> >
> > Last time I tested fm1readmarkers() of C extension, I didn't see noticeable
> > difference with/without GC (though I just checked wall clock.) So I suspect
> > the perf result depends on system, size of obsstore, etc.
> >
> > > So many of our C routines
> > > are relatively slow (revlog index, obsmarkers, dag walking, etc) because
> > > they operate on PyObject instead of raw C data structures. If we remove
> > > PyObject from critical path code, we eliminate most GC concerns and
> > likely
> > > make execution at *least* 5x faster.
> >
> > That's true for obsmarkers, but IIRC revlog index doesn't create a boxed
> > object until it is accessed.
> 
> That is true. And it is smart enough to reuse the PyObject on subsequent
> access. However, many things realize PyObject when they don't need to. For
> example, when calling revlog.revision(), one has to determine the delta
> chain and start and end offsets for each revision. This requires
> constructing the index entry tuples (which consist of 9 PyObject - the
> tuple and its 8 members). For a delta chain of 10,000 (not uncommon for
> manifests on large repos), you've created 90,000+ PyObjects!

Got it, thanks. Maybe the first example could be mitigated by using a dedicated
type backed by C struct (like dirstate tuple), but it would just make the number
of PyObjects to x1/9.

> A better
> solution is to implement these critical functions in !Python and/or pass
> around arrays of "packed" data so we only realize the Python objects on
> demand.

(I read !Python = Rust or anything else that isn't C of CPython. I don't
wanna fix bugs of ref-counting. :)


More information about the Mercurial-devel mailing list