[PATCH 6 of 8 V5] worker: make sure killworkers runs at most once

Jun Wu quark at fb.com
Wed Nov 16 10:52:24 EST 2016


Excerpts from Yuya Nishihara's message of 2016-11-17 00:34:28 +0900:
> On Wed, 16 Nov 2016 15:18:07 +0000, Jun Wu wrote:
> > Excerpts from Yuya Nishihara's message of 2016-11-17 00:00:01 +0900:
> > > On Wed, 16 Nov 2016 13:57:58 +0000, Jun Wu wrote:
> > > > Excerpts from Yuya Nishihara's message of 2016-11-16 21:25:45 +0900:
> > > > > I meant set.discard(). My idea is:
> > > > > 
> > > > >  a. discard pid by anyone who notices it no longer exists
> > > > >  b. then, kill the rest
> > > > 
> > > > This looks good. Although it's still possible to kill processes multiple
> > > > times (because atomic "wait-and-drop-pid" is impossible).
> > > 
> > > Hmm, signal handlers are stacked at arbitrary code location, there's no
> > > time-based preemption like threading, I think. So even if the wait-discard(p)
> > > sequence is interrupted by another waitforworkers(), the process "p" would
> > 
> > Only one "p" will be discard. There could be multiple "p"s that need to be
> > discarded.
> > 
> >   waitforworkers
> >     p1: need to be discarded
> >       SIGCHLD
> >         waitforworkers
> >         p2: need to be discarded
> >       SIGCHLD
> >         waitforworkers
> 
> Here waitforworkers() should discard p2 because waitpid(p2) would raise ECHILD.

You're correct in this case. It's a bit more complex because the iteration
order of a "set" is undefined. "set.discard" may change the order, so the
nested "waitforworkers" may not visit the same pid visited by the outer
"waitforworkers". sorted(pids.copy()) would guarantee the order. If we take
all known measures (itertools.count, set.discard, sorted), it seems we can
make sure there is at most 1 process who will receive an unnecessary kill.

I think the current code is good enough in terms of correctness. It's also a
bit stricter - only successful waitpid will remove p from the set - if a
(unrealistic) buggy OS allows us to wait a same pid twice, we will complain.
I'm going to leave it as-is.

> >         p3: need to be discarded
> >       SIGCHLD
> >         waitforworkers
> >         p4: need to be discarded
> >       ....
> >     discard p1
> >     killworkers
> >       # kill p2, p3, p4 unnecessarily
> > 
> > As long as there is no way to do "atomic waitpid + set.discard", the problem
> > can not be solved perfectly.
> > 
> > > be discarded before killworkers() thanks to ECHILD. And we do unregister the
> > > SIGCHLD handler before killworkers().


More information about the Mercurial-devel mailing list