D1973: bdiff: write a native version of splitnewlines

Fri Feb 2 08:34:36 EST 2018

yuja added inline comments.

INLINE COMMENTS

> bdiff.c:197
> +	const char *text;
> +	int i, start = 0;
> +	Py_ssize_t nelts = 0, size;

Perhaps Py_ssize_t is preferred.

> bdiff.c:202
> +	if (!PyArg_ParseTuple(args, "s#", &text, &size))
> +		goto abort;
> +	if (!size) {

Here `result` isn't initialized to NULL yet.

> indygreg wrote in bdiff.c:216-217
> I have a feeling this extra line scan will matter in a benchmark. Could you `perf record` the new `hg perf*` command and verify? If it is a big deal, I would allocate an `int[16384]` array on the stack or something to deal with the common case and store additional newline offsets on the heap so we only do the newline scan once.

I don't know which is faster, but if we do preallocate a buffer,  we can create
a large `PyList` and shrink it by `PyList_SetSlice(..., NULL)` at the end.

`builtin_filter()` of Python 2 do that.

> indygreg wrote in mdiff.py:43
> @yuja may have an opinion on this, but mine is that we now have stronger guarantees around C extension versioning, so if the active module policy is to use C extensions, we should ensure we get `splitnewlines` from the C module. I /think/ if we move `splitnewlines` to the `bdiff` module that things will sort themselves out? This could be done as a follow-up easily enough though.

> if we move splitnewlines to the bdiff module that things will sort themselves out

Yes, splitnewlines should be moved to pure/bdiff.py. And we need to bump the C API version.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1973

To: durin42, #hg-reviewers, indygreg
Cc: indygreg, yuja, mercurial-devel