[PATCH 1 of 5 V5] dirstate: rename the dirstate parsing and packing methods
Laurent Charignon
lcharignon at fb.com
Sat Dec 19 11:34:53 CST 2015
> On Dec 19, 2015, at 12:44 AM, Laurent Charignon <lcharignon at fb.com> wrote:
>
> Thanks Matt and Yuya! I will benchmark it on the same repo as Durham to see the numbers because it seems to contradict the order of magnitude that Durham had. I will send a V6 next week.
Thanks Yuya for the Pro-Tip to quickly benchmark an individual function, I had no idea how to do that well before.
I can reproduce Durham's number on our large repos with Yuya's python code: ~350ms for building the map.
However, it looks like computing the non-normal map in C on the same big repo would cost ~20ms, that's a big win, so, I will follow Matt's plan!
Thanks,
Laurent
>
> Thanks,
>
> Laurent
>
> Sent from my iPhone
>
>> On Dec 18, 2015, at 10:39 PM, Yuya Nishihara <yuya at tcha.org> wrote:
>>
>> On Sat, 19 Dec 2015 03:14:06 +0000, Laurent Charignon wrote:
>>>> On Dec 18, 2015, at 3:02 PM, Matt Mackall <mpm at selenic.com> wrote:
>>>>> On Thu, 2015-12-17 at 15:44 -0800, Laurent Charignon wrote:
>>>>> # HG changeset patch
>>>>> # User Laurent Charignon <lcharignon at fb.com>
>>>>> # Date 1450380353 28800
>>>>> # Thu Dec 17 11:25:53 2015 -0800
>>>>> # Node ID 44eafafb98c9d24bdff7d6c46213ffe2cf8edc8d
>>>>> # Parent 7e8a883da1711008262150bb6f64131812de3e0b
>>>>> dirstate: rename the dirstate parsing and packing methods
>>>>
>>>> Explodes my hg on apply. The pure/ code is not available for fallback
>>>> unless the C module doesn't exist at all, fallback doesn't happen on a
>>>> function by function basis.
>>>>
>>>> Consider this strategy:
>>>>
>>>> - don't touch parse_dirstate
>>>> - don't touch the read/write methods
>>>> - add a simple Python func (in dirstate, not pure) to build
>>>> nonnormalmap from map
>>>> - attach this to a propertycache so we only build it when needed
>>>> - add a C version of that function in parsers
>>>> - add a clause in Python function to call C function if available
>>>
>>> Thanks for the review.
>>>
>>> Your solution should work but it forces us to pick one of the following way
>>> build the non-normal map:
>>> 1) build the non-normal map from map (ad you suggest)
>>> 2) parse the dirstate file again
>>>
>>> Both 1) and 2) come at a performance cost that is prohibitive according to
>>> Durham's first email on the topic sent to the list.
>>> The time breakdown he sent was:
>>> "A) parsing the data (370ms)
>>> B) iterating over the dirstate looking for added/removed/lookup files (350ms)
>>> C) 100ms of GC time
>>> D) 60ms of reading the raw data off disk"
>>>
>>> The goal of this series was to eliminate the cost of B by increasing the cost
>>> of A a little bit.
>>> If we go with 1) or 2), it comes down to cutting ~350ms and adding either
>>> ~350ms or ~370ms.
>>
>> Assuming the question here is how fast (1) or (B) would be in C, I've tried
>> a quick benchmark. If (1) is fast enough, we won't have to care for the
>> consistency of cached nonnormalmap or set.
>>
>> Command:
>> % python -m timeit -s \
>> 'from mercurial import hg, ui, parsers; \
>> repo = hg.repository(ui.ui(), "mozilla-central"); \
>> m = repo.dirstate._map' \
>> 'parsers.select_nonnomal_dirstate(m)'
>>
>> Result:
>> HGMODULEPOLICY=c 100 loops, best of 3: 3.15 msec per loop
>> HGMODULEPOLICY=py 10 loops, best of 3: 31.7 msec per loop
>>
>> Patch (returns set, which could be a list?):
>> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
>> --- a/mercurial/parsers.c
>> +++ b/mercurial/parsers.c
>> @@ -675,6 +675,41 @@ bail:
>> return NULL;
>> }
>>
>> +static PyObject *select_nonnomal_dirstate(PyObject *self, PyObject *args)
>> +{
>> + PyObject *dmap, *nonnset = NULL, *fname, *v;
>> + Py_ssize_t pos;
>> +
>> + if (!PyArg_ParseTuple(args, "O!:select_nonnomal_dirstate",
>> + &PyDict_Type, &dmap))
>> + goto bail;
>> +
>> + nonnset = PySet_New(NULL);
>> + if (nonnset == NULL)
>> + goto bail;
>> +
>> + pos = 0;
>> + while (PyDict_Next(dmap, &pos, &fname, &v)) {
>> + dirstateTupleObject *t;
>> + if (!dirstate_tuple_check(v)) {
>> + PyErr_SetString(PyExc_TypeError,
>> + "expected a dirstate tuple");
>> + goto bail;
>> + }
>> + t = (dirstateTupleObject *)v;
>> +
>> + if (t->state == 'n' && t->mtime != -1)
>> + continue;
>> + if (PySet_Add(nonnset, fname) == -1)
>> + goto bail;
>> + }
>> +
>> + return nonnset;
>> +bail:
>> + Py_XDECREF(nonnset);
>> + return NULL;
>> +}
>> +
>> /*
>> * A base-16 trie for fast node->rev mapping.
>> *
>> @@ -2742,6 +2777,8 @@ static PyMethodDef methods[] = {
>> {"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
>> {"parse_manifest", parse_manifest, METH_VARARGS, "parse a manifest\n"},
>> {"parse_dirstate", parse_dirstate, METH_VARARGS, "parse a dirstate\n"},
>> + {"select_nonnomal_dirstate", select_nonnomal_dirstate, METH_VARARGS,
>> + "create a set containing non-normal entries of given dirstate\n"},
>> {"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
>> {"asciilower", asciilower, METH_VARARGS, "lowercase an ASCII string\n"},
>> {"asciiupper", asciiupper, METH_VARARGS, "uppercase an ASCII string\n"},
>> diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
>> --- a/mercurial/pure/parsers.py
>> +++ b/mercurial/pure/parsers.py
>> @@ -113,3 +113,7 @@ def pack_dirstate(dmap, copymap, pl, now
>> write(e)
>> write(f)
>> return cs.getvalue()
>> +
>> +def select_nonnomal_dirstate(dmap):
>> + return set(fname for fname, e in dmap.iteritems()
>> + if e[0] != 'n' or e[3] == -1)
More information about the Mercurial-devel
mailing list