[PATCH STABLE] worker: ignore meaningless exit status indication returned by os.waitpid()

Fri Feb 24 09:52:26 EST 2017

At Fri, 24 Feb 2017 22:40:14 +0900,
Yuya Nishihara wrote:
> 
> On Fri, 24 Feb 2017 06:05:29 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1487883654 -32400
> > #      Fri Feb 24 06:00:54 2017 +0900
> > # Branch stable
> > # Node ID d879917b416a305a42ab92a6d3ac2121d6830560
> > # Parent  aa25989b0658dcefbd4c1bce7c389f006f22af30
> > worker: ignore meaningless exit status indication returned by os.waitpid()
> > 
> > Before this patch, worker implementation assumes that os.waitpid()
> > with os.WNOHANG returns '(0, 0)' for still running child process. This
> > is explicitly specified as below in Python API document.
> > 
> >     os.WNOHANG
> >         The option for waitpid() to return immediately if no child
> >         process status is available immediately. The function returns
> >         (0, 0) in this case.
> > 
> > On the other hand, POSIX specification doesn't define the "stat_loc"
> > value returned by waitpid() with WNOHANG for such child process.
> > 
> >     http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html
> > 
> > CPython implementation for os.waitpid() on POSIX doesn't take any care
> > of this gap, and this may cause unexpected "exit status indication"
> > even on POSIX conformance platform.
> > 
> > For example, os.waitpid() with os.WNOHANG returns non-zero "exit
> > status indication" on FreeBSD. This implies os.kill() with own pid or
> > sys.exit() with non-zero exit code, even if no child process fails.
> 
> Thanks for the detailed explanation. This looks good except for the assertion.
> 
> > This patch also adds "assert not st", to detect this kind of breakage
> > immediately. In this unexpected case, blocking os.waitpid() with
> > explicit pid returns '(0, non-zero)' without any exception.
> > 
> > diff --git a/mercurial/worker.py b/mercurial/worker.py
> > --- a/mercurial/worker.py
> > +++ b/mercurial/worker.py
> > @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a
> >              if p:
> >                  pids.discard(p)
> >                  st = _exitstatus(st)
> > +            elif not blocking:
> > +                # ignore st in this case, because it might be non-zero
> > +                # on some platforms, even though it should be zero
> > +                # according to Python document
> > +                #
> > +                # See also https://bugs.python.org/issue27808
> > +                continue
> > +            else:
> > +                assert not st
> >              if st and not problem[0]:
> >                  problem[0] = st
> 
> This assertion is confusing. You said "blocking os.waitpid() returns
> '(0, non-zero)'", where p == 0 should be invalid. But the code says st should
> be zero (since we've tested p == 0 beforehand and we do nothing if st is zero.)
> 
> Moreover, we can take waitpid(pid, whatever) a function that may return
> 0 if child isn't finished yet. It makes zero sense to test the 'st' value if
> p == 0. So I don't think there's a practical benefit to handle blocking and
> non-blocking cases differently.

OK, I don't have strong opinion to add this assertion.

I'll post revised one.

-- 
----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp