[PATCH 1 of 5 V5] dirstate: rename the dirstate parsing and packing methods
Laurent Charignon
lcharignon at fb.com
Sat Dec 19 02:44:37 CST 2015
Thanks Matt and Yuya! I will benchmark it on the same repo as Durham to see the numbers because it seems to contradict the order of magnitude that Durham had. I will send a V6 next week.
Thanks,
Laurent
Sent from my iPhone
> On Dec 18, 2015, at 10:39 PM, Yuya Nishihara <yuya at tcha.org> wrote:
>
> On Sat, 19 Dec 2015 03:14:06 +0000, Laurent Charignon wrote:
>>> On Dec 18, 2015, at 3:02 PM, Matt Mackall <mpm at selenic.com> wrote:
>>>> On Thu, 2015-12-17 at 15:44 -0800, Laurent Charignon wrote:
>>>> # HG changeset patch
>>>> # User Laurent Charignon <lcharignon at fb.com>
>>>> # Date 1450380353 28800
>>>> # Thu Dec 17 11:25:53 2015 -0800
>>>> # Node ID 44eafafb98c9d24bdff7d6c46213ffe2cf8edc8d
>>>> # Parent 7e8a883da1711008262150bb6f64131812de3e0b
>>>> dirstate: rename the dirstate parsing and packing methods
>>>
>>> Explodes my hg on apply. The pure/ code is not available for fallback
>>> unless the C module doesn't exist at all, fallback doesn't happen on a
>>> function by function basis.
>>>
>>> Consider this strategy:
>>>
>>> - don't touch parse_dirstate
>>> - don't touch the read/write methods
>>> - add a simple Python func (in dirstate, not pure) to build
>>> nonnormalmap from map
>>> - attach this to a propertycache so we only build it when needed
>>> - add a C version of that function in parsers
>>> - add a clause in Python function to call C function if available
>>
>> Thanks for the review.
>>
>> Your solution should work but it forces us to pick one of the following way
>> build the non-normal map:
>> 1) build the non-normal map from map (ad you suggest)
>> 2) parse the dirstate file again
>>
>> Both 1) and 2) come at a performance cost that is prohibitive according to
>> Durham's first email on the topic sent to the list.
>> The time breakdown he sent was:
>> "A) parsing the data (370ms)
>> B) iterating over the dirstate looking for added/removed/lookup files (350ms)
>> C) 100ms of GC time
>> D) 60ms of reading the raw data off disk"
>>
>> The goal of this series was to eliminate the cost of B by increasing the cost
>> of A a little bit.
>> If we go with 1) or 2), it comes down to cutting ~350ms and adding either
>> ~350ms or ~370ms.
>
> Assuming the question here is how fast (1) or (B) would be in C, I've tried
> a quick benchmark. If (1) is fast enough, we won't have to care for the
> consistency of cached nonnormalmap or set.
>
> Command:
> % python -m timeit -s \
> 'from mercurial import hg, ui, parsers; \
> repo = hg.repository(ui.ui(), "mozilla-central"); \
> m = repo.dirstate._map' \
> 'parsers.select_nonnomal_dirstate(m)'
>
> Result:
> HGMODULEPOLICY=c 100 loops, best of 3: 3.15 msec per loop
> HGMODULEPOLICY=py 10 loops, best of 3: 31.7 msec per loop
>
> Patch (returns set, which could be a list?):
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -675,6 +675,41 @@ bail:
> return NULL;
> }
>
> +static PyObject *select_nonnomal_dirstate(PyObject *self, PyObject *args)
> +{
> + PyObject *dmap, *nonnset = NULL, *fname, *v;
> + Py_ssize_t pos;
> +
> + if (!PyArg_ParseTuple(args, "O!:select_nonnomal_dirstate",
> + &PyDict_Type, &dmap))
> + goto bail;
> +
> + nonnset = PySet_New(NULL);
> + if (nonnset == NULL)
> + goto bail;
> +
> + pos = 0;
> + while (PyDict_Next(dmap, &pos, &fname, &v)) {
> + dirstateTupleObject *t;
> + if (!dirstate_tuple_check(v)) {
> + PyErr_SetString(PyExc_TypeError,
> + "expected a dirstate tuple");
> + goto bail;
> + }
> + t = (dirstateTupleObject *)v;
> +
> + if (t->state == 'n' && t->mtime != -1)
> + continue;
> + if (PySet_Add(nonnset, fname) == -1)
> + goto bail;
> + }
> +
> + return nonnset;
> +bail:
> + Py_XDECREF(nonnset);
> + return NULL;
> +}
> +
> /*
> * A base-16 trie for fast node->rev mapping.
> *
> @@ -2742,6 +2777,8 @@ static PyMethodDef methods[] = {
> {"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
> {"parse_manifest", parse_manifest, METH_VARARGS, "parse a manifest\n"},
> {"parse_dirstate", parse_dirstate, METH_VARARGS, "parse a dirstate\n"},
> + {"select_nonnomal_dirstate", select_nonnomal_dirstate, METH_VARARGS,
> + "create a set containing non-normal entries of given dirstate\n"},
> {"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
> {"asciilower", asciilower, METH_VARARGS, "lowercase an ASCII string\n"},
> {"asciiupper", asciiupper, METH_VARARGS, "uppercase an ASCII string\n"},
> diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
> --- a/mercurial/pure/parsers.py
> +++ b/mercurial/pure/parsers.py
> @@ -113,3 +113,7 @@ def pack_dirstate(dmap, copymap, pl, now
> write(e)
> write(f)
> return cs.getvalue()
> +
> +def select_nonnomal_dirstate(dmap):
> + return set(fname for fname, e in dmap.iteritems()
> + if e[0] != 'n' or e[3] == -1)
More information about the Mercurial-devel
mailing list