[PATCH 1 of 5 V5] dirstate: rename the dirstate parsing and packing methods

Laurent Charignon lcharignon at fb.com
Sat Dec 19 11:34:53 CST 2015


> On Dec 19, 2015, at 12:44 AM, Laurent Charignon <lcharignon at fb.com> wrote:
> 
> Thanks Matt and Yuya! I will benchmark it on the same repo as Durham to see the numbers because it seems to contradict the order of magnitude that Durham had. I will send a V6 next week.

Thanks Yuya for the Pro-Tip to quickly benchmark an individual function, I had no idea how to do that well before.

I can reproduce Durham's number on our large repos with Yuya's python code: ~350ms for building the map.
However, it looks like computing the non-normal map in C on the same big repo would cost ~20ms, that's a big win, so, I will follow Matt's plan!

Thanks,

Laurent 

> 
> Thanks,
> 
> Laurent
> 
> Sent from my iPhone
> 
>> On Dec 18, 2015, at 10:39 PM, Yuya Nishihara <yuya at tcha.org> wrote:
>> 
>> On Sat, 19 Dec 2015 03:14:06 +0000, Laurent Charignon wrote:
>>>> On Dec 18, 2015, at 3:02 PM, Matt Mackall <mpm at selenic.com> wrote:
>>>>> On Thu, 2015-12-17 at 15:44 -0800, Laurent Charignon wrote:
>>>>> # HG changeset patch
>>>>> # User Laurent Charignon <lcharignon at fb.com>
>>>>> # Date 1450380353 28800
>>>>> #      Thu Dec 17 11:25:53 2015 -0800
>>>>> # Node ID 44eafafb98c9d24bdff7d6c46213ffe2cf8edc8d
>>>>> # Parent 7e8a883da1711008262150bb6f64131812de3e0b
>>>>> dirstate: rename the dirstate parsing and packing methods
>>>> 
>>>> Explodes my hg on apply. The pure/ code is not available for fallback
>>>> unless the C module doesn't exist at all, fallback doesn't happen on a
>>>> function by function basis.
>>>> 
>>>> Consider this strategy:
>>>> 
>>>> - don't touch parse_dirstate
>>>> - don't touch the read/write methods
>>>> - add a simple Python func (in dirstate, not pure) to build
>>>> nonnormalmap from map
>>>> - attach this to a propertycache so we only build it when needed
>>>> - add a C version of that function in parsers
>>>> - add a clause in Python function to call C function if available
>>> 
>>> Thanks for the review.
>>> 
>>> Your solution should work but it forces us to pick one of the following way
>>> build the non-normal map:
>>> 1) build the non-normal map from map (ad you suggest)
>>> 2) parse the dirstate file again
>>> 
>>> Both 1) and 2) come at a performance cost that is prohibitive according to
>>> Durham's first email on the topic sent to the list.
>>> The time breakdown he sent was:
>>> "A) parsing the data (370ms)
>>> B) iterating over the dirstate looking for added/removed/lookup files (350ms)
>>> C) 100ms of GC time
>>> D) 60ms of reading the raw data off disk"
>>> 
>>> The goal of this series was to eliminate the cost of B by increasing the cost
>>> of A a little bit.
>>> If we go with 1) or 2), it comes down to cutting ~350ms and adding either
>>> ~350ms or ~370ms.
>> 
>> Assuming the question here is how fast (1) or (B) would be in C, I've tried
>> a quick benchmark. If (1) is fast enough, we won't have to care for the
>> consistency of cached nonnormalmap or set.
>> 
>> Command:
>> % python -m timeit -s \
>> 'from mercurial import hg, ui, parsers; \
>> repo = hg.repository(ui.ui(), "mozilla-central"); \
>> m = repo.dirstate._map' \
>> 'parsers.select_nonnomal_dirstate(m)'
>> 
>> Result:
>> HGMODULEPOLICY=c  100 loops, best of 3: 3.15 msec per loop
>> HGMODULEPOLICY=py 10 loops, best of 3: 31.7 msec per loop
>> 
>> Patch (returns set, which could be a list?):
>> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
>> --- a/mercurial/parsers.c
>> +++ b/mercurial/parsers.c
>> @@ -675,6 +675,41 @@ bail:
>>   return NULL;
>> }
>> 
>> +static PyObject *select_nonnomal_dirstate(PyObject *self, PyObject *args)
>> +{
>> +    PyObject *dmap, *nonnset = NULL, *fname, *v;
>> +    Py_ssize_t pos;
>> +
>> +    if (!PyArg_ParseTuple(args, "O!:select_nonnomal_dirstate",
>> +                  &PyDict_Type, &dmap))
>> +        goto bail;
>> +
>> +    nonnset = PySet_New(NULL);
>> +    if (nonnset == NULL)
>> +        goto bail;
>> +
>> +    pos = 0;
>> +    while (PyDict_Next(dmap, &pos, &fname, &v)) {
>> +        dirstateTupleObject *t;
>> +        if (!dirstate_tuple_check(v)) {
>> +            PyErr_SetString(PyExc_TypeError,
>> +                    "expected a dirstate tuple");
>> +            goto bail;
>> +        }
>> +        t = (dirstateTupleObject *)v;
>> +
>> +        if (t->state == 'n' && t->mtime != -1)
>> +            continue;
>> +        if (PySet_Add(nonnset, fname) == -1)
>> +            goto bail;
>> +    }
>> +
>> +    return nonnset;
>> +bail:
>> +    Py_XDECREF(nonnset);
>> +    return NULL;
>> +}
>> +
>> /*
>> * A base-16 trie for fast node->rev mapping.
>> *
>> @@ -2742,6 +2777,8 @@ static PyMethodDef methods[] = {
>>   {"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
>>   {"parse_manifest", parse_manifest, METH_VARARGS, "parse a manifest\n"},
>>   {"parse_dirstate", parse_dirstate, METH_VARARGS, "parse a dirstate\n"},
>> +    {"select_nonnomal_dirstate", select_nonnomal_dirstate, METH_VARARGS,
>> +     "create a set containing non-normal entries of given dirstate\n"},
>>   {"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
>>   {"asciilower", asciilower, METH_VARARGS, "lowercase an ASCII string\n"},
>>   {"asciiupper", asciiupper, METH_VARARGS, "uppercase an ASCII string\n"},
>> diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
>> --- a/mercurial/pure/parsers.py
>> +++ b/mercurial/pure/parsers.py
>> @@ -113,3 +113,7 @@ def pack_dirstate(dmap, copymap, pl, now
>>        write(e)
>>        write(f)
>>    return cs.getvalue()
>> +
>> +def select_nonnomal_dirstate(dmap):
>> +    return set(fname for fname, e in dmap.iteritems()
>> +               if e[0] != 'n' or e[3] == -1)



More information about the Mercurial-devel mailing list