[PATCH 1 of 5 V5] dirstate: rename the dirstate parsing and packing methods

Yuya Nishihara yuya at tcha.org
Sat Dec 19 00:36:56 CST 2015


On Sat, 19 Dec 2015 03:14:06 +0000, Laurent Charignon wrote:
> > On Dec 18, 2015, at 3:02 PM, Matt Mackall <mpm at selenic.com> wrote:
> > On Thu, 2015-12-17 at 15:44 -0800, Laurent Charignon wrote:
> >> # HG changeset patch
> >> # User Laurent Charignon <lcharignon at fb.com>
> >> # Date 1450380353 28800
> >> #      Thu Dec 17 11:25:53 2015 -0800
> >> # Node ID 44eafafb98c9d24bdff7d6c46213ffe2cf8edc8d
> >> # Parent  7e8a883da1711008262150bb6f64131812de3e0b
> >> dirstate: rename the dirstate parsing and packing methods
> > 
> > Explodes my hg on apply. The pure/ code is not available for fallback
> > unless the C module doesn't exist at all, fallback doesn't happen on a
> > function by function basis.
> > 
> > Consider this strategy:
> > 
> > - don't touch parse_dirstate
> > - don't touch the read/write methods
> > - add a simple Python func (in dirstate, not pure) to build
> > nonnormalmap from map
> > - attach this to a propertycache so we only build it when needed
> > - add a C version of that function in parsers
> > - add a clause in Python function to call C function if available
> 
> Thanks for the review.
> 
> Your solution should work but it forces us to pick one of the following way
> build the non-normal map:
> 1) build the non-normal map from map (ad you suggest)
> 2) parse the dirstate file again
> 
> Both 1) and 2) come at a performance cost that is prohibitive according to
> Durham's first email on the topic sent to the list.
> The time breakdown he sent was:
> "A) parsing the data (370ms)
> B) iterating over the dirstate looking for added/removed/lookup files (350ms)
> C) 100ms of GC time
> D) 60ms of reading the raw data off disk"
> 
> The goal of this series was to eliminate the cost of B by increasing the cost
> of A a little bit.
> If we go with 1) or 2), it comes down to cutting ~350ms and adding either
> ~350ms or ~370ms.

Assuming the question here is how fast (1) or (B) would be in C, I've tried
a quick benchmark. If (1) is fast enough, we won't have to care for the
consistency of cached nonnormalmap or set.

Command:
% python -m timeit -s \
'from mercurial import hg, ui, parsers; \
repo = hg.repository(ui.ui(), "mozilla-central"); \
m = repo.dirstate._map' \
'parsers.select_nonnomal_dirstate(m)'

Result:
HGMODULEPOLICY=c  100 loops, best of 3: 3.15 msec per loop
HGMODULEPOLICY=py 10 loops, best of 3: 31.7 msec per loop

Patch (returns set, which could be a list?):
diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -675,6 +675,41 @@ bail:
 	return NULL;
 }
 
+static PyObject *select_nonnomal_dirstate(PyObject *self, PyObject *args)
+{
+	PyObject *dmap, *nonnset = NULL, *fname, *v;
+	Py_ssize_t pos;
+
+	if (!PyArg_ParseTuple(args, "O!:select_nonnomal_dirstate",
+			      &PyDict_Type, &dmap))
+		goto bail;
+
+	nonnset = PySet_New(NULL);
+	if (nonnset == NULL)
+		goto bail;
+
+	pos = 0;
+	while (PyDict_Next(dmap, &pos, &fname, &v)) {
+		dirstateTupleObject *t;
+		if (!dirstate_tuple_check(v)) {
+			PyErr_SetString(PyExc_TypeError,
+					"expected a dirstate tuple");
+			goto bail;
+		}
+		t = (dirstateTupleObject *)v;
+
+		if (t->state == 'n' && t->mtime != -1)
+			continue;
+		if (PySet_Add(nonnset, fname) == -1)
+			goto bail;
+	}
+
+	return nonnset;
+bail:
+	Py_XDECREF(nonnset);
+	return NULL;
+}
+
 /*
  * A base-16 trie for fast node->rev mapping.
  *
@@ -2742,6 +2777,8 @@ static PyMethodDef methods[] = {
 	{"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
 	{"parse_manifest", parse_manifest, METH_VARARGS, "parse a manifest\n"},
 	{"parse_dirstate", parse_dirstate, METH_VARARGS, "parse a dirstate\n"},
+	{"select_nonnomal_dirstate", select_nonnomal_dirstate, METH_VARARGS,
+	 "create a set containing non-normal entries of given dirstate\n"},
 	{"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
 	{"asciilower", asciilower, METH_VARARGS, "lowercase an ASCII string\n"},
 	{"asciiupper", asciiupper, METH_VARARGS, "uppercase an ASCII string\n"},
diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
--- a/mercurial/pure/parsers.py
+++ b/mercurial/pure/parsers.py
@@ -113,3 +113,7 @@ def pack_dirstate(dmap, copymap, pl, now
         write(e)
         write(f)
     return cs.getvalue()
+
+def select_nonnomal_dirstate(dmap):
+    return set(fname for fname, e in dmap.iteritems()
+               if e[0] != 'n' or e[3] == -1)


More information about the Mercurial-devel mailing list