[PATCH 2 of 4] parsers: write dirstate starting with non-normal entries

Wed Nov 25 08:24:54 CST 2015

On Tue, 24 Nov 2015 17:19:09 -0800, Laurent Charignon wrote:
> # HG changeset patch
> # User Laurent Charignon <lcharignon at fb.com>
> # Date 1448413597 28800
> #      Tue Nov 24 17:06:37 2015 -0800
> # Node ID ea9d03d4e85ea3949bb8d16bd9e1a80246a8247b
> # Parent  3bd86861a1618aabe6ec7f2cde1223282f9569be
> parsers: write dirstate starting with non-normal entries
> 
> Before this patch we were writing the dirstate entries in a "random" way,
> following the *unstable* order of a Python dictionary. This patch changes the
> order in which we write the dirstate entries.
> 
> We now start with the non-normal files (that have changed and likely to have
> changed) and end with the normal files. This makes the job of hg status easier
> as, in most cases, it will need to access the non-normal entries of the
> dirstate. This new ordering allows hg status to stop iterating over the dirstate
> after processing those entries.
> 
> On our large repos, for hg status, we achieve a 40% improvement.
> On the same repo, the cost of this change is a slowdown for writing the
> dirstate to disk (as we do two passes). I measured the execution time of
> hg debugrebuilddirstate with and without the change and observed a 5% slowdown
> for the overall command.
> 
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -602,7 +602,10 @@
>  	}
>  	memcpy(p, s, l);
>  	p += 20;
> -	if (0 == 0) {
> +	int pass;

New variable can't be declared here according to C89, and MSVC does complain it.

> +	/* First pass, non normal files, second pass normal files. This is to improve
> +	 * status performance as status generally only need the non normal files */
> +	for (pass = 0; pass <= 1; pass++) {
>  		for (pos = 0; PyDict_Next(map, &pos, &k, &v); ) {
>  			dirstateTupleObject *tuple;
>  			char state;
> @@ -610,6 +613,7 @@
>  			Py_ssize_t len, l;
>  			PyObject *o;
>  			char *t;
> +			int normal;
>  
>  			if (!dirstate_tuple_check(v)) {
>  				PyErr_SetString(PyExc_TypeError,
> @@ -622,6 +626,9 @@
>  			mode = tuple->mode;
>  			size = tuple->size;
>  			mtime = tuple->mtime;
> +			normal = (state == 'n' && mtime != -1);
> +			if (normal != pass)
> +				continue;

As we have the pass to figure out memory size before the "first pass", maybe
we can know the offset to the first "normal" file beforehand. Then, we won't
need the "second pass".