[PATCH 6 of 8 V5] worker: make sure killworkers runs at most once
Yuya Nishihara
yuya at tcha.org
Wed Nov 16 07:25:45 EST 2016
On Tue, 15 Nov 2016 22:28:46 +0000, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-11-15 23:02:35 +0900:
> > On Tue, 15 Nov 2016 02:39:09 +0000, Jun Wu wrote:
> > > # HG changeset patch
> > > # User Jun Wu <quark at fb.com>
> > > # Date 1479176697 0
> > > # Tue Nov 15 02:24:57 2016 +0000
> > > # Node ID 8402c91c250a9dd369296dcdf00f7b50110ff6ae
> > > # Parent 9dbb3532b173a980f341e41d9e96338a386364e5
> > > # Available At https://bitbucket.org/quark-zju/hg-draft
> > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 8402c91c250a
> > > worker: make sure killworkers runs at most once
> >
> > > @@ -90,4 +91,5 @@ def _posixworker(ui, func, staticargs, a
> > > signal.signal(signal.SIGINT, signal.SIG_IGN)
> > > pids, problem = set(), [0]
> > > + problemcount = itertools.count()
> > > def killworkers():
> > > # if one worker bails, there's no good reason to wait for the rest
> > > @@ -114,5 +116,8 @@ def _posixworker(ui, func, staticargs, a
> > > pids.remove(p)
> > > st = _exitstatus(st)
> > > - if st and not problem[0]:
> > > + # next(itertools.count()) cannot be interrupted by a Python signal
> > > + # handler - true for both CPython and PyPy. So killworkers() runs
> > > + # at most once.
> > > + if st and next(problemcount) == 0:
> >
> > I'm not sure if this atomic counter buys. Since we'll remove threading and
> > unregister SIGCHLD handler before killworkers(), the only concern would be
> > whether pids are collected by waitpid() or not. If waitforworkers() is
> > interrupted between os.waitpid() and pids.remove(), the waited pid could be
> > killed again. But this patch doesn't prevent it.
>
> I agree that the only important thing is whether processes are waited or
> not.
>
> And I was aware of the situation you described (so I used the words "making
> it harder" instead of "making it impossible" in patch 7).
>
> However, this patch prevents killing the "unwaited" processes multiple
> times, which is not that important but is "nice-to-have", I think.
>
> Up to you for decision.
>
> > two workers; the latter fails:
> > # pids = [2, 3]
> > SIGCHLD-2
> > waitpid(2) -> p = 2, st = ok
> > SIGCHLD-3
> > waitpid(2) -> ECHLD
> > waitpid(3) -> p = 3, st = error
> > pids.remove(3)
> > # pids = [2]
> > killworkers()
> > pids.remove(2)
> >
> > two workers; both fail:
> > # pids = [2, 3]
>
> If pids = [2, 3, 4, 5, 6, 7, 8]. killworkers may kill 4 to 8 twice without
> this patch.
Yes, but killing active/zombie processes twice would be harmless. And I don't
think it's likely to happen enough we have to take special care with the atomic
counter.
> > SIGCHLD-2
> > waitpid(2) -> p = 2, st = error
> > SIGCHLD-3
> > waitpid(2) -> ECHLD
> > waitpid(3) -> p = 3, st = error
> > pids.remove(3)
> > # pids = [2]
> > killworkers()
> > pids.remove(2)
> > # pids = []
> > killworkers()
> >
> > Maybe we can discard pid if waitpid() fails with ECHLD?
>
> If ECHILD happens, it means the child is waited by another "waitforworkers",
> who is responsible to do "pids.remove(p)". Discarding pid when ECHILD
> happens would cause one of them to raise Key Error.
I meant set.discard(). My idea is:
a. discard pid by anyone who notices it no longer exists
b. then, kill the rest
something like this:
for pid in pids.copy():
p, st = waitpid(pid, WNOHANG)
if ECHILD:
pids.discard(p)
if p:
pids.discard(p)
if st:
problem[0] = st
if problem[0]:
restore_signal()
killworkers()
More information about the Mercurial-devel
mailing list