[PATCH 1 of 5 V5] dirstate: rename the dirstate parsing and packing methods

Laurent Charignon lcharignon at fb.com
Sat Dec 19 02:44:37 CST 2015


Thanks Matt and Yuya! I will benchmark it on the same repo as Durham to see the numbers because it seems to contradict the order of magnitude that Durham had. I will send a V6 next week.

Thanks,

Laurent

Sent from my iPhone

> On Dec 18, 2015, at 10:39 PM, Yuya Nishihara <yuya at tcha.org> wrote:
> 
> On Sat, 19 Dec 2015 03:14:06 +0000, Laurent Charignon wrote:
>>> On Dec 18, 2015, at 3:02 PM, Matt Mackall <mpm at selenic.com> wrote:
>>>> On Thu, 2015-12-17 at 15:44 -0800, Laurent Charignon wrote:
>>>> # HG changeset patch
>>>> # User Laurent Charignon <lcharignon at fb.com>
>>>> # Date 1450380353 28800
>>>> #      Thu Dec 17 11:25:53 2015 -0800
>>>> # Node ID 44eafafb98c9d24bdff7d6c46213ffe2cf8edc8d
>>>> # Parent  7e8a883da1711008262150bb6f64131812de3e0b
>>>> dirstate: rename the dirstate parsing and packing methods
>>> 
>>> Explodes my hg on apply. The pure/ code is not available for fallback
>>> unless the C module doesn't exist at all, fallback doesn't happen on a
>>> function by function basis.
>>> 
>>> Consider this strategy:
>>> 
>>> - don't touch parse_dirstate
>>> - don't touch the read/write methods
>>> - add a simple Python func (in dirstate, not pure) to build
>>> nonnormalmap from map
>>> - attach this to a propertycache so we only build it when needed
>>> - add a C version of that function in parsers
>>> - add a clause in Python function to call C function if available
>> 
>> Thanks for the review.
>> 
>> Your solution should work but it forces us to pick one of the following way
>> build the non-normal map:
>> 1) build the non-normal map from map (ad you suggest)
>> 2) parse the dirstate file again
>> 
>> Both 1) and 2) come at a performance cost that is prohibitive according to
>> Durham's first email on the topic sent to the list.
>> The time breakdown he sent was:
>> "A) parsing the data (370ms)
>> B) iterating over the dirstate looking for added/removed/lookup files (350ms)
>> C) 100ms of GC time
>> D) 60ms of reading the raw data off disk"
>> 
>> The goal of this series was to eliminate the cost of B by increasing the cost
>> of A a little bit.
>> If we go with 1) or 2), it comes down to cutting ~350ms and adding either
>> ~350ms or ~370ms.
> 
> Assuming the question here is how fast (1) or (B) would be in C, I've tried
> a quick benchmark. If (1) is fast enough, we won't have to care for the
> consistency of cached nonnormalmap or set.
> 
> Command:
> % python -m timeit -s \
> 'from mercurial import hg, ui, parsers; \
> repo = hg.repository(ui.ui(), "mozilla-central"); \
> m = repo.dirstate._map' \
> 'parsers.select_nonnomal_dirstate(m)'
> 
> Result:
> HGMODULEPOLICY=c  100 loops, best of 3: 3.15 msec per loop
> HGMODULEPOLICY=py 10 loops, best of 3: 31.7 msec per loop
> 
> Patch (returns set, which could be a list?):
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -675,6 +675,41 @@ bail:
>    return NULL;
> }
> 
> +static PyObject *select_nonnomal_dirstate(PyObject *self, PyObject *args)
> +{
> +    PyObject *dmap, *nonnset = NULL, *fname, *v;
> +    Py_ssize_t pos;
> +
> +    if (!PyArg_ParseTuple(args, "O!:select_nonnomal_dirstate",
> +                  &PyDict_Type, &dmap))
> +        goto bail;
> +
> +    nonnset = PySet_New(NULL);
> +    if (nonnset == NULL)
> +        goto bail;
> +
> +    pos = 0;
> +    while (PyDict_Next(dmap, &pos, &fname, &v)) {
> +        dirstateTupleObject *t;
> +        if (!dirstate_tuple_check(v)) {
> +            PyErr_SetString(PyExc_TypeError,
> +                    "expected a dirstate tuple");
> +            goto bail;
> +        }
> +        t = (dirstateTupleObject *)v;
> +
> +        if (t->state == 'n' && t->mtime != -1)
> +            continue;
> +        if (PySet_Add(nonnset, fname) == -1)
> +            goto bail;
> +    }
> +
> +    return nonnset;
> +bail:
> +    Py_XDECREF(nonnset);
> +    return NULL;
> +}
> +
> /*
>  * A base-16 trie for fast node->rev mapping.
>  *
> @@ -2742,6 +2777,8 @@ static PyMethodDef methods[] = {
>    {"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
>    {"parse_manifest", parse_manifest, METH_VARARGS, "parse a manifest\n"},
>    {"parse_dirstate", parse_dirstate, METH_VARARGS, "parse a dirstate\n"},
> +    {"select_nonnomal_dirstate", select_nonnomal_dirstate, METH_VARARGS,
> +     "create a set containing non-normal entries of given dirstate\n"},
>    {"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
>    {"asciilower", asciilower, METH_VARARGS, "lowercase an ASCII string\n"},
>    {"asciiupper", asciiupper, METH_VARARGS, "uppercase an ASCII string\n"},
> diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
> --- a/mercurial/pure/parsers.py
> +++ b/mercurial/pure/parsers.py
> @@ -113,3 +113,7 @@ def pack_dirstate(dmap, copymap, pl, now
>         write(e)
>         write(f)
>     return cs.getvalue()
> +
> +def select_nonnomal_dirstate(dmap):
> +    return set(fname for fname, e in dmap.iteritems()
> +               if e[0] != 'n' or e[3] == -1)


More information about the Mercurial-devel mailing list