[PATCH V2] bdiff: don't check border condition in loop

Gregory Szorc gregory.szorc at gmail.com
Mon Nov 21 00:57:03 UTC 2016


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1479689781 28800
#      Sun Nov 20 16:56:21 2016 -0800
# Node ID d8d336c2dd0dc1e9928d49d805c880394969fad6
# Parent  9375077f1ace71dfa2fc87a1d4eaeae8de267e20
bdiff: don't check border condition in loop

This is pretty much a copy of d500ddae7494, just to a different loop.

The condition `p == plast` (`plast == a + len - 1`) was only true on
the final iteration of the loop. So it was wasteful to check for it
on every iteration. We decrease the iteration count by 1 and add an
explicit check for `p == plast` after the loop.

Again, we see modest wins.

>From the mozilla-unified repository:

$ perfbdiff -m 3041e4d59df2
! wall 0.035502 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
! wall 0.030480 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)

$ perfbdiff 0e9928989e9c --alldata --count 100
! wall 4.097394 comb 4.100000 user 4.100000 sys 0.000000 (best of 3)
! wall 3.597798 comb 3.600000 user 3.600000 sys 0.000000 (best of 3)

The 2nd example throws a total of ~3.3GB of data at bdiff. This
change increases the throughput from ~811 MB/s to ~924 MB/s.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -47,10 +47,10 @@ int bdiff_splitlines(const char *a, ssiz
 
 	/* build the line array and calculate hashes */
 	hash = 0;
-	for (p = a; p < a + len; p++) {
+	for (p = a; p < plast; p++) {
 		hash = HASH(hash, *p);
 
-		if (*p == '\n' || p == plast) {
+		if (*p == '\n') {
 			l->hash = hash;
 			hash = 0;
 			l->len = p - b + 1;
@@ -61,6 +61,15 @@ int bdiff_splitlines(const char *a, ssiz
 		}
 	}
 
+	if (p == plast) {
+		hash = HASH(hash, *p);
+		l->hash = hash;
+		l->len = p - b + 1;
+		l->l = b;
+		l->n = INT_MAX;
+		l++;
+	}
+
 	/* set up a sentinel */
 	l->hash = 0;
 	l->len = 0;


More information about the Mercurial-devel mailing list