Opinion needed: multiprocessing usage

Augie Fackler raf at durin42.com
Fri Nov 29 14:45:40 EST 2019


On Fri, Nov 29, 2019, 06:45 Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 11/12/19 4:35 AM, Gregory Szorc wrote:
> > On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <raf at durin42.com
> > <mailto:raf at durin42.com>> wrote:
> >
> >     (+indygreg)
> >
> >      > On Nov 11, 2019, at 03:04, Pierre-Yves David
> >     <pierre-yves.david at ens-lyon.org
> >     <mailto:pierre-yves.david at ens-lyon.org>> wrote:
> >      >
> >      > Hi everyone,
> >      >
> >      > I am looking into introducing parallelism into `hg
> >     debugupgraderepo`. I already have a very useful prototype that
> >     precompute in // copies information when converting to side-data
> >     storage. That prototype use multiprocessing because it is part of
> >     the stdlib and work quite well for this usecase.
> >      >
> >      > However, I know we refrained to use multiprocessing in the past.
> >     I know the import and boostrap cost was to heavy for things like `hg
> >     update`. However, I am not sure if there are other reason to rule
> >     out the multiprocessing module in the `hg debugupgraderepo` case.
> >
> >     I have basically only ever heard bad things about multiprocessing,
> >     especially on Windows which is the platform where you'd expect it to
> >     be the most useful (since there's no fork()). I think Greg has more
> >     details in his head.
> >
> >     That said, I guess feel free to experiment, in the knowledge that it
> >     probably isn't significantly better than our extant worker system?
> >
> >
> > multiprocessing is a pit of despair on Python 2.7. It is a bit better on
> > Python 3. But I still don't trust it. I think you are better off using
> > `concurrent.futures.ProcessPoolExecutor`.
>
> That looks great, but this is not available in python-2.7
>

There's a backport of the 3.x concurrent futures available on pypi, and
AIUI it fixes some important bugs in the package that didn't ever land in
2.x.


> > But I'm not even sure I trust ProcessPoolExecutor on Windows, especially
> > when `sys.executable` is `hg.exe` instead of `python.exe`: I think both
> > multiprocessing and concurrent.futures make assumptions about how to
> > invoke the "run a worker" code on a new process that is invalidated when
> > the main process isn't `python.exe`.
>
> That's unfortunate :-/ Any way to reliably test this and get it fixed
> upstream ?
>
> > So I think we may have to roll our own "start a worker" code. The
> > solution that's been bouncing around in my head is to add a `hg
> > debugworker` command (or similar) that dispatches work read from a
> > pipe/file descriptor/temp file to a named <module>.<function> callable.
> > When then implement a custom executor conforming to the interface that
> > concurrent.futures wants and we use that for work dispatch. One of the
> > hardest parts here is implementing a fair work scheduler. There are all
> > kinds of gnarly problems involving buffering, permissions, cross
> > platform differences, etc. Even Rust doesn't have a good cross-platform
> > library for this type of message passing last time I asked (a few months
> > ago I asked and was advised to use something like 0mq, which made me
> > sad). Maybe there is a reasonable Python library we can vendor. But I
> > suspect we'll find limitations in any implementation, as this is a
> > subtly hard problem.
>
> Yeah, the problem is hard enough that I would rather have external
> library dealing with it.
>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20191129/a5fb8fe1/attachment.html>


More information about the Mercurial-devel mailing list