[PATCH] chg: handle EOF reading data block

Jun Wu quark at fb.com
Mon Jul 18 14:34:10 EDT 2016


# HG changeset patch
# User Jun Wu <quark at fb.com>
# Date 1468864506 -3600
#      Mon Jul 18 18:55:06 2016 +0100
# Node ID c063823d664a978415bde11e0e13a6c3983d1dd9
# Parent  953839de96ab574caa40557c542c262286c6287c
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r c063823d664a
chg: handle EOF reading data block

We recently discovered a case in production that chg uses 100% CPU and is
trying to read data forever:

  recvfrom(4, "", 1814012019, 0, NULL, NULL) = 0

Using gdb, apparently readchannel() got wrong data. It was reading in an
infinite loop because rsize == 0 does not exit the loop, while the server
process had ended.

  (gdb) bt
  #0 ... in recv () at /lib64/libc.so.6
  #1 ... in readchannel (...) at /usr/include/bits/socket2.h:45
  #2 ... in readchannel (hgc=...) at hgclient.c:129
  #3 ... in handleresponse (hgc=...) at hgclient.c:255
  #4 ... in hgc_runcommand (hgc=..., args=<optimized>, argsize=<optimized>)
  #5 ... in main (argc=...486922636, argv=..., envp=...) at chg.c:661
  (gdb) frame 2
  (gdb) p *hgc
  $1 = {sockfd = 4, pid = 381152, ctx = {ch = 108 'l',
        data = 0x7fb05164f010 "st):\nTraceback (most recent call last):\n"
        "Traceback (most recent call last):\ne", maxdatasize = 1814065152,"
        " datasize = 1814064225}, capflags = 16131}

This patch addresses the infinite loop issue by detecting continuously empty
responses and abort in that case.

Note that datasize can be translated to ['l', ' ', 'l', 'a']. Concatenate
datasize and data, it forms part of "Traceback (most recent call last):".

This may indicate a server-side channeledoutput issue. If it is a race
condition, we may want to use flock to protect the channels.

diff --git a/contrib/chg/hgclient.c b/contrib/chg/hgclient.c
--- a/contrib/chg/hgclient.c
+++ b/contrib/chg/hgclient.c
@@ -126,10 +126,15 @@ static void readchannel(hgclient_t *hgc)
 		return;  /* assumes input request */
 
 	size_t cursize = 0;
+	int emptycount = 0;
 	while (cursize < hgc->ctx.datasize) {
 		rsize = recv(hgc->sockfd, hgc->ctx.data + cursize,
 			     hgc->ctx.datasize - cursize, 0);
-		if (rsize < 0)
+		/* rsize == 0 normally indicates EOF, while it's also a valid
+		 * packet size for unix socket. treat it as EOF and abort if
+		 * we get many empty responses in a row. */
+		emptycount = (rsize == 0 ? emptycount + 1 : 0);
+		if (rsize < 0 || emptycount > 20)
 			abortmsg("failed to read data block");
 		cursize += rsize;
 	}


More information about the Mercurial-devel mailing list