[PATCH 6 of 8 V5] worker: make sure killworkers runs at most once

Jun Wu quark at fb.com
Tue Nov 15 17:28:46 EST 2016


Excerpts from Yuya Nishihara's message of 2016-11-15 23:02:35 +0900:
> On Tue, 15 Nov 2016 02:39:09 +0000, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark at fb.com>
> > # Date 1479176697 0
> > #      Tue Nov 15 02:24:57 2016 +0000
> > # Node ID 8402c91c250a9dd369296dcdf00f7b50110ff6ae
> > # Parent  9dbb3532b173a980f341e41d9e96338a386364e5
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 8402c91c250a
> > worker: make sure killworkers runs at most once
> 
> > @@ -90,4 +91,5 @@ def _posixworker(ui, func, staticargs, a
> >      signal.signal(signal.SIGINT, signal.SIG_IGN)
> >      pids, problem = set(), [0]
> > +    problemcount = itertools.count()
> >      def killworkers():
> >          # if one worker bails, there's no good reason to wait for the rest
> > @@ -114,5 +116,8 @@ def _posixworker(ui, func, staticargs, a
> >                  pids.remove(p)
> >                  st = _exitstatus(st)
> > -            if st and not problem[0]:
> > +            # next(itertools.count()) cannot be interrupted by a Python signal
> > +            # handler - true for both CPython and PyPy. So killworkers() runs
> > +            # at most once.
> > +            if st and next(problemcount) == 0:
> 
> I'm not sure if this atomic counter buys. Since we'll remove threading and
> unregister SIGCHLD handler before killworkers(), the only concern would be
> whether pids are collected by waitpid() or not. If waitforworkers() is
> interrupted between os.waitpid() and pids.remove(), the waited pid could be
> killed again. But this patch doesn't prevent it.

I agree that the only important thing is whether processes are waited or
not.

And I was aware of the situation you described (so I used the words "making
it harder" instead of "making it impossible" in patch 7).

However, this patch prevents killing the "unwaited" processes multiple
times, which is not that important but is "nice-to-have", I think.

Up to you for decision.

>   two workers; the latter fails:
>   # pids = [2, 3]
>   SIGCHLD-2
>     waitpid(2) -> p = 2, st = ok
>     SIGCHLD-3
>       waitpid(2) -> ECHLD
>       waitpid(3) -> p = 3, st = error
>       pids.remove(3)
>       # pids = [2]
>       killworkers()
>     pids.remove(2)
> 
>   two workers; both fail:
>   # pids = [2, 3]

If pids = [2, 3, 4, 5, 6, 7, 8]. killworkers may kill 4 to 8 twice without
this patch.

>   SIGCHLD-2
>     waitpid(2) -> p = 2, st = error
>     SIGCHLD-3
>       waitpid(2) -> ECHLD
>       waitpid(3) -> p = 3, st = error
>       pids.remove(3)
>       # pids = [2]
>       killworkers()
>     pids.remove(2)
>     # pids = []
>     killworkers()
> 
> Maybe we can discard pid if waitpid() fails with ECHLD?

If ECHILD happens, it means the child is waited by another "waitforworkers",
who is responsible to do "pids.remove(p)". Discarding pid when ECHILD
happens would cause one of them to raise KeyError.

> The series looks good. I'm going to queue these patches. If you agree this
> atomic counter is somewhat redundant, I'll drop it.

Thanks!


More information about the Mercurial-devel mailing list