[PATCH 01 of 11] bdiff: use Python memory allocator in fixws

Jun Wu quark at fb.com
Thu Mar 9 17:16:02 EST 2017


Excerpts from Gregory Szorc's message of 2017-03-09 14:08:26 -0800:
> On Thu, Mar 9, 2017 at 1:59 PM, Gregory Szorc <gregory.szorc at gmail.com>
> wrote:
> 
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1489089265 28800
> > #      Thu Mar 09 11:54:25 2017 -0800
> > # Node ID 7a8ce919a13a9ff6e73f391fe14d228e18387f15
> > # Parent  1871a1ee64ed49172b1568b89cdbab126284b309
> > bdiff: use Python memory allocator in fixws
> >
> 
> contrib/revsetbenchmarks.py failed to show any detectable needle moving
> from any of these changes. I /think/ roots() got slightly faster. But it
> wasn't reported by the benchmark tool.
> 
> Also, some of the memset() in here are probably not necessary. I put them
> in to maintain compatibility with the previous code.
> 
> The remaining malloc() in mercurial/ after this series are in the pure C
> (non Python) .c files or in places where we don't hold the GIL and can't
> use PyMem_* (such as in manifest.c).
> 
> I've seen malloc() vs PyMem_Malloc() make a performance difference in some
> scenarios. So a follow-up here might be to teach the pure C code to take a
> struct that defines pointers to alloc and free functions so they can
> leverage Python's allocator if warranted.

Could we just use a C macro for that? I don't think we need to maintain both
versions at runtime. So if compiled separately, the C code works without
Python stuff. And by default we use PyMem. There is no pointer dereference
overhead.

I have seen the libc qsort (3) is noticeably slower than C++ std::sort. I
guess that's because function dereference overhead.

> There are also a few places (like in bdiff's fixws) where we could use a
> stack allocation if the input is small and fall back to a (slow) heap
> allocation if it is large.
> 
> Lots of potential for follow-up. But it's probably not the highest priority.
> 
> >
> > Python has its own memory allocation APIs. For allocations
> > <= 512 bytes, it allocates memory from arenas. This means that
> > average small allocations don't call the system allocator, which
> > makes them faster. Also, arena allocations cut down on memory
> > fragmentation, which can matter for performance in long-running
> > processes.
> >
> > Another advantage of using the Python memory allocator is that
> > allocations are tracked by Python. This is a bigger deal in
> > Python 3, as modern versions of Python have some decent built-in
> > tools for examining memory usage, leaks, etc.
> >
> > This patch converts a trivial malloc() + free() in the bdiff code
> > to use the Python allocator APIs. Since the object being
> > operated on is a line, chances are it will use an arena. So,
> > this could have a net positive impact on performance (although
> > I didn't measure it).
> >
> > diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
> > --- a/mercurial/bdiff_module.c
> > +++ b/mercurial/bdiff_module.c
> > @@ -158,7 +158,7 @@ static PyObject *fixws(PyObject *self, P
> >         r = PyBytes_AsString(s);
> >         rlen = PyBytes_Size(s);
> >
> > -       w = (char *)malloc(rlen ? rlen : 1);
> > +       w = (char *)PyMem_Malloc(rlen ? rlen : 1);
> >         if (!w)
> >                 goto nomem;
> >
> > @@ -178,7 +178,7 @@ static PyObject *fixws(PyObject *self, P
> >         result = PyBytes_FromStringAndSize(w, wlen);
> >
> >  nomem:
> > -       free(w);
> > +       PyMem_Free(w);
> >         return result ? result : PyErr_NoMemory();
> >  }
> >
> >


More information about the Mercurial-devel mailing list