[PATCH 3 of 7] parsers: avoid PyList_Append when parsing obs markers

Thu Mar 16 10:20:48 EDT 2017

On Mon, 13 Mar 2017 22:15:45 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1489454332 25200
> #      Mon Mar 13 18:18:52 2017 -0700
> # Node ID 6d3b13f243bea2c9200eecb827d7ec0ea771fc54
> # Parent  3b997adb7efece7395fb585a70b2d70522626648
> parsers: avoid PyList_Append when parsing obs markers
> 
> PyList_Append() has to continuously grow the underlying list as
> available storage is exhausted. This adds overhead within tight
> loops.
> 
> It is better to create a new list of known size then use the
> PyList_SET_ITEM macro to populate each element of the list
> without error checking. This commit switches us to that
> mechanism.
> 
> The new code establishes an array to hold PyObject* instances
> then iterates the array at the end. The bulk of the new code
> is for memory management of the new array. There are some
> inline comments explaining how the array is sized.
> 
> `hg perfloadmarkers` show an improvement with this change:
> 
> ! result: 130264
> ! wall 0.125479 comb 0.130000 user 0.120000 sys 0.010000 (best of 76)
> ! wall 0.120487 comb 0.130000 user 0.120000 sys 0.010000 (best of 79)

I'm not so enthusiastic about this perf number compared to the complexity
of additional memory management. Does this contribute to the net perf win?

> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -2792,33 +2792,77 @@ static PyObject *fm1readmarkers(PyObject
>  	int datalen;
>  	Py_ssize_t offset, stop;
>  	PyObject *markers = NULL;
> +	PyObject **tmpmarkers = NULL;
> +	Py_ssize_t tmpmarkerssize;
> +	Py_ssize_t tmpmarkersoffset = 0;
>  
>  	if (!PyArg_ParseTuple(args, "s#nn", &data, &datalen, &offset, &stop)) {
>  		return NULL;
>  	}
>  	dataend = data + datalen;
>  	data += offset;
> -	markers = PyList_New(0);
> -	if (!markers) {
> +
> +	/*
> +	   We store an array of pointers to markers then build a list from
> +	   it at the end. This is faster than appending to a PyList for each
> +	   element.
> +
> +	   Reallocating the array constantly could be expensive. We estimate
> +	   the size of the array by assuming 90 bytes per marker, which is
> +	   based on data seen in the wild.
> +	*/
> +	tmpmarkerssize = datalen / 90;
> +	/* Handle datalen < 90 */
> +	tmpmarkerssize = tmpmarkerssize ? tmpmarkerssize : 10;
> +
> +	tmpmarkers = (PyObject **)PyMem_Malloc(
> +		tmpmarkerssize * sizeof(PyObject *));
> +	if (!tmpmarkers) {
> +		PyErr_NoMemory();
>  		return NULL;
>  	}

Maybe we could do PyList_New(estimated_capacity) and shrink it at the end by
PyList_SetSlice(). The API doc doesn't state that PyList_SetSlice() can handle
NULL items, but builtin_zip() appears to do that.

  Python/bltinmodule.c:builtin_zip()

> +	free(tmpmarkers);

This should be PyMem_Free() for consistency.