[PATCH stable] commandserver: handle backlog before exiting

Jun Wu quark at fb.com
Tue Jan 31 15:11:09 EST 2017


Excerpts from Yuya Nishihara's message of 2017-02-01 00:48:26 +0900:
> On Tue, 31 Jan 2017 15:24:42 +0000, Jun Wu wrote:
> > Excerpts from Yuya Nishihara's message of 2017-01-31 23:36:46 +0900:
> > > > This introduces another race condition that unlinks an innocent socket file.
> > > > I will send a V2 later.
> > 
> > (This actually seems to be a regression after the socketserver refactoring.
> > It probably requires sub-classing "unixforkingservice" in chg to solve
> > properly. It's not too trivial and I'll send fixes after the freeze.)
> 
> I don't think SocketServer does anything special for flushing backlog.

Sorry, I missed chgunixservicehandler's customized "unlinksocket"
implementation. So chg should actually work just fine, while commandserver
needs some care to avoid double unlink (which is probably just adding an
"if" to prevent the second unlink).

> > > I have no idea if we can reliably make the socket stop queuing new connections
> > > without destroying the backlog.
> > > 
> > > Instead, maybe the client can handle very first ECONNRESET gracefully?
> > 
> > After "unlink", the client cannot "connect" by a filesystem address. And chg
> > client does connect by a filesystem address. So it should be reliable (not
> > considering Linux-specific black magic /proc/random-pid/fd/n).
> 
> Yeah, but you found there's another race.

That's a separated one (double unlink), not hard to solve.

> > I think the "industry standard" about shutting down is to handle all
> > remaining requests while rejecting new ones. The pattern is seen in nginx,
> > squid, thin (a Ruby on Rails server), OpenStack APIs etc.
> 
> If they do, we can copy. Can you investigate how they solve the problem?
> I tried if shutdown(SHUT_RD) could be used, but it didn't help as expected.

The epoll server (thin / eventmachine) uses "epoll_ctl (epfd, EPOLL_CTL_DEL,
...)". Squid code seems too complex. I guess "shutdown" is for connections
that are the return value of accept (3), while we want to deal with the
"socket" argument of accept (3), so shutdown does not help.

> > It looks better
> > than retrying in the client.
> 
> Yes, but we have retry for connect(), so adding retry for the first recv()
> wouldn't make things much worse.

That's actually interesting as it can probably cover ECONNREFUSED too. I'll
try if it could be implemented cleanly. That said, I would still like the
server-side fix which is simply effective.


More information about the Mercurial-devel mailing list