Opinion needed: multiprocessing usage

Gregory Szorc gregory.szorc at gmail.com
Mon Nov 11 22:35:25 EST 2019


On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <raf at durin42.com> wrote:

> (+indygreg)
>
> > On Nov 11, 2019, at 03:04, Pierre-Yves David <
> pierre-yves.david at ens-lyon.org> wrote:
> >
> > Hi everyone,
> >
> > I am looking into introducing parallelism into `hg debugupgraderepo`. I
> already have a very useful prototype that precompute in // copies
> information when converting to side-data storage. That prototype use
> multiprocessing because it is part of the stdlib and work quite well for
> this usecase.
> >
> > However, I know we refrained to use multiprocessing in the past. I know
> the import and boostrap cost was to heavy for things like `hg update`.
> However, I am not sure if there are other reason to rule out the
> multiprocessing module in the `hg debugupgraderepo` case.
>
> I have basically only ever heard bad things about multiprocessing,
> especially on Windows which is the platform where you'd expect it to be the
> most useful (since there's no fork()). I think Greg has more details in his
> head.
>
> That said, I guess feel free to experiment, in the knowledge that it
> probably isn't significantly better than our extant worker system?
>

multiprocessing is a pit of despair on Python 2.7. It is a bit better on
Python 3. But I still don't trust it. I think you are better off using
`concurrent.futures.ProcessPoolExecutor`.

But I'm not even sure I trust ProcessPoolExecutor on Windows, especially
when `sys.executable` is `hg.exe` instead of `python.exe`: I think both
multiprocessing and concurrent.futures make assumptions about how to invoke
the "run a worker" code on a new process that is invalidated when the main
process isn't `python.exe`.

So I think we may have to roll our own "start a worker" code. The solution
that's been bouncing around in my head is to add a `hg debugworker` command
(or similar) that dispatches work read from a pipe/file descriptor/temp
file to a named <module>.<function> callable. When then implement a custom
executor conforming to the interface that concurrent.futures wants and we
use that for work dispatch. One of the hardest parts here is implementing a
fair work scheduler. There are all kinds of gnarly problems involving
buffering, permissions, cross platform differences, etc. Even Rust doesn't
have a good cross-platform library for this type of message passing last
time I asked (a few months ago I asked and was advised to use something
like 0mq, which made me sad). Maybe there is a reasonable Python library we
can vendor. But I suspect we'll find limitations in any implementation, as
this is a subtly hard problem.


>
> >
> > Cheers,
> >
> > --
> > Pierre-Yves David
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20191111/d55e1715/attachment.html>


More information about the Mercurial-devel mailing list