[PATCH 3 of 3] worker: handle worker failures more aggressively

Idan Kamara idankk86 at gmail.com
Tue Feb 19 13:00:45 CST 2013


On Tue, Feb 19, 2013 at 8:29 PM, Bryan O'Sullivan <bos at serpentine.com>
wrote:
>
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1361297550 28800
> # Node ID 42c14cff887e20d033dbaa8f8c00100e807a1149
> # Parent  9ef52f0a93a0cba939742743ff59e4c2a2463fab
> worker: handle worker failures more aggressively
>
> We now wait for worker processes in a separate thread, so that we can
> spot failures in a timely way, wihout waiting for the progress pipe
> to drain.
>
> If a worker fails, we recover the pre-parallel-update behaviour of
> failing early by killing its peers before propagating the failure.
>
> diff --git a/mercurial/worker.py b/mercurial/worker.py
> --- a/mercurial/worker.py
> +++ b/mercurial/worker.py
> @@ -6,7 +6,7 @@
>  # GNU General Public License version 2 or any later version.
>
>  from i18n import _
> -import os, signal, sys, util
> +import os, signal, sys, threading, util
>
>  def countcpus():
>      '''try to count the number of CPUs on the system'''
> @@ -77,6 +77,7 @@ def _posixworker(ui, func, staticargs, a
>      workers = _numworkers(ui)
>      oldhandler = signal.getsignal(signal.SIGINT)
>      signal.signal(signal.SIGINT, signal.SIG_IGN)
> +    pids, problem = [], [0]
>      for pargs in partition(args, workers):
>          pid = os.fork()
>          if pid == 0:
> @@ -88,25 +89,40 @@ def _posixworker(ui, func, staticargs, a
>                  os._exit(0)
>              except KeyboardInterrupt:
>                  os._exit(255)
> +        pids.append(pid)
> +    pids.reverse()

Ok, so the last created child will be the first in pids.

>      os.close(wfd)
>      fp = os.fdopen(rfd, 'rb', 0)
> -    def cleanup():
> -        # python 2.4 is too dumb for try/yield/finally
> -        signal.signal(signal.SIGINT, oldhandler)
> -        problem = None
> -        for i in xrange(workers):
> +    def killworkers():
> +        # if one worker bails, there's no good reason to wait for the
> rest
> +        for p in pids:
> +            try:
> +                os.kill(p, signal.SIGTERM)
> +            except OSError, err:
> +                if err.errno != errno.ESRCH:
> +                    raise
> +    def waitforworkers():
> +        for p in pids:
>              pid, st = os.wait()

And here you're waiting for it to finish, but what happens
if for some reason one of the previous children fails first?

Why not use select on the children and also spare the thread?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20130219/fa6e75e1/attachment.html>


More information about the Mercurial-devel mailing list