<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Mar 13, 2017 at 10:15 PM, Gregory Szorc <span dir="ltr"><<a href="mailto:gregory.szorc@gmail.com" target="_blank">gregory.szorc@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>

# User Gregory Szorc <<a href="mailto:gregory.szorc@gmail.com">gregory.szorc@gmail.com</a>><br>

# Date 1489454332 25200<br>

#      Mon Mar 13 18:18:52 2017 -0700<br>

# Node ID 6d3b13f243bea2c9200eecb827d7ec<wbr>0ea771fc54<br>

# Parent  3b997adb7efece7395fb585a70b2d7<wbr>0522626648<br>

parsers: avoid PyList_Append when parsing obs markers<br>

<br>

PyList_Append() has to continuously grow the underlying list as<br>

available storage is exhausted. This adds overhead within tight<br>

loops.<br>

<br>

It is better to create a new list of known size then use the<br>

PyList_SET_ITEM macro to populate each element of the list<br>

without error checking. This commit switches us to that<br>

mechanism.<br>

<br>

The new code establishes an array to hold PyObject* instances<br>

then iterates the array at the end. The bulk of the new code<br>

is for memory management of the new array. There are some<br>

inline comments explaining how the array is sized.<br>

<br>

`hg perfloadmarkers` show an improvement with this change:<br>

<br>

! result: 130264<br>

! wall 0.125479 comb 0.130000 user 0.120000 sys 0.010000 (best of 76)<br>

! wall 0.120487 comb 0.130000 user 0.120000 sys 0.010000 (best of 79)<br></blockquote><div><br></div><div>This series makes obs marker loading (`hg perfloadmarkers`) ~4x faster.<br><br></div><div>However, on that same repo where this series made just loading ~95ms faster (which is nothing to sneeze at), `hg log -r .` decreased from ~740ms to ~440ms. So I guess the lazy loading of obs markers has benefits to other operations. Perhaps we're loading the obs store multiple times? Or perhaps my series busted things so much in the RFC parts that the numbers are misleading.<br><br></div><div>Assuming there is some truth to the numbers, holy cow the overhead of PyObject creation is crazy. I suddenly feel like auditing the C code for everywhere we're instantiating PyObjects before it is necessary. I know we do this in the revlog index. I wonder if dirstate is impacted...<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

diff --git a/mercurial/parsers.c b/mercurial/parsers.c<br>

--- a/mercurial/parsers.c<br>

+++ b/mercurial/parsers.c<br>

@@ -2792,33 +2792,77 @@ static PyObject *fm1readmarkers(PyObject<br>

        int datalen;<br>

        Py_ssize_t offset, stop;<br>

        PyObject *markers = NULL;<br>

+       PyObject **tmpmarkers = NULL;<br>

+       Py_ssize_t tmpmarkerssize;<br>

+       Py_ssize_t tmpmarkersoffset = 0;<br>

<br>

        if (!PyArg_ParseTuple(args, "s#nn", &data, &datalen, &offset, &stop)) {<br>

                return NULL;<br>

        }<br>

        dataend = data + datalen;<br>

        data += offset;<br>

-       markers = PyList_New(0);<br>

-       if (!markers) {<br>

+<br>

+       /*<br>

+          We store an array of pointers to markers then build a list from<br>

+          it at the end. This is faster than appending to a PyList for each<br>

+          element.<br>

+<br>

+          Reallocating the array constantly could be expensive. We estimate<br>

+          the size of the array by assuming 90 bytes per marker, which is<br>

+          based on data seen in the wild.<br>

+       */<br>

+       tmpmarkerssize = datalen / 90;<br>

+       /* Handle datalen < 90 */<br>

+       tmpmarkerssize = tmpmarkerssize ? tmpmarkerssize : 10;<br>

+<br>

+       tmpmarkers = (PyObject **)PyMem_Malloc(<br>

+               tmpmarkerssize * sizeof(PyObject *));<br>

+       if (!tmpmarkers) {<br>

+               PyErr_NoMemory();<br>

                return NULL;<br>

        }<br>

+<br>

        while (offset < stop) {<br>

                uint32_t msize;<br>

-               int error;<br>

+               void *tmpbuffer;<br>

                PyObject *record = fm1readmarker(data, dataend, &msize);<br>

                if (!record) {<br>

                        goto bail;<br>

                }<br>

-               error = PyList_Append(markers, record);<br>

-               Py_DECREF(record);<br>

-               if (error) {<br>

-                       goto bail;<br>

+<br>

+               /* Allocate more array storage if needed */<br>

+               if (tmpmarkersoffset + 1 == tmpmarkerssize) {<br>

+                       /* 5000 chosen somewhat arbitrarily. */<br>

+                       tmpmarkerssize += 5000;<br>

+                       tmpbuffer = PyMem_Realloc(tmpmarkers,<br>

+                               tmpmarkerssize * sizeof(PyObject*));<br>

+                       if (!tmpbuffer) {<br>

+                               PyErr_NoMemory();<br>

+                               goto bail;<br>

+                       }<br>

+                       tmpmarkers = (PyObject **)tmpbuffer;<br>

                }<br>

+<br>

+               tmpmarkers[tmpmarkersoffset++] = record;<br>

                data += msize;<br>

                offset += msize;<br>

        }<br>

+<br>

+       /* Convert our array of pointers to a PyList */<br>

+       markers = PyList_New(tmpmarkersoffset);<br>

+       if (!markers) {<br>

+               goto bail;<br>

+       }<br>

+<br>

+       for (offset = 0; offset < tmpmarkersoffset; offset++) {<br>

+               PyList_SET_ITEM(markers, offset, tmpmarkers[offset]);<br>

+       }<br>

+<br>

+       free(tmpmarkers);<br>

+<br>

        return markers;<br>

 bail:<br>

+       free(tmpmarkers);<br>

        Py_DECREF(markers);<br>

        return NULL;<br>

 }<br>

</blockquote></div><br></div></div>