Opinion needed: multiprocessing usage
Gregory Szorc
gregory.szorc at gmail.com
Fri Nov 29 14:51:30 EST 2019
> On Nov 29, 2019, at 11:46, Augie Fackler <raf at durin42.com> wrote:
>
>
>
>
>> On Fri, Nov 29, 2019, 06:45 Pierre-Yves David <pierre-yves.david at ens-lyon.org> wrote:
>>
>>
>> On 11/12/19 4:35 AM, Gregory Szorc wrote:
>> > On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <raf at durin42.com
>> > <mailto:raf at durin42.com>> wrote:
>> >
>> > (+indygreg)
>> >
>> > > On Nov 11, 2019, at 03:04, Pierre-Yves David
>> > <pierre-yves.david at ens-lyon.org
>> > <mailto:pierre-yves.david at ens-lyon.org>> wrote:
>> > >
>> > > Hi everyone,
>> > >
>> > > I am looking into introducing parallelism into `hg
>> > debugupgraderepo`. I already have a very useful prototype that
>> > precompute in // copies information when converting to side-data
>> > storage. That prototype use multiprocessing because it is part of
>> > the stdlib and work quite well for this usecase.
>> > >
>> > > However, I know we refrained to use multiprocessing in the past.
>> > I know the import and boostrap cost was to heavy for things like `hg
>> > update`. However, I am not sure if there are other reason to rule
>> > out the multiprocessing module in the `hg debugupgraderepo` case.
>> >
>> > I have basically only ever heard bad things about multiprocessing,
>> > especially on Windows which is the platform where you'd expect it to
>> > be the most useful (since there's no fork()). I think Greg has more
>> > details in his head.
>> >
>> > That said, I guess feel free to experiment, in the knowledge that it
>> > probably isn't significantly better than our extant worker system?
>> >
>> >
>> > multiprocessing is a pit of despair on Python 2.7. It is a bit better on
>> > Python 3. But I still don't trust it. I think you are better off using
>> > `concurrent.futures.ProcessPoolExecutor`.
>>
>> That looks great, but this is not available in python-2.7
>
>
> There's a backport of the 3.x concurrent futures available on pypi, and AIUI it fixes some important bugs in the package that didn't ever land in 2.x.
We have it vendored :)
Only used on Python 2 via pycompat shim IIRC.
>
>>
>> > But I'm not even sure I trust ProcessPoolExecutor on Windows, especially
>> > when `sys.executable` is `hg.exe` instead of `python.exe`: I think both
>> > multiprocessing and concurrent.futures make assumptions about how to
>> > invoke the "run a worker" code on a new process that is invalidated when
>> > the main process isn't `python.exe`.
>>
>> That's unfortunate :-/ Any way to reliably test this and get it fixed
>> upstream ?
>>
>> > So I think we may have to roll our own "start a worker" code. The
>> > solution that's been bouncing around in my head is to add a `hg
>> > debugworker` command (or similar) that dispatches work read from a
>> > pipe/file descriptor/temp file to a named <module>.<function> callable.
>> > When then implement a custom executor conforming to the interface that
>> > concurrent.futures wants and we use that for work dispatch. One of the
>> > hardest parts here is implementing a fair work scheduler. There are all
>> > kinds of gnarly problems involving buffering, permissions, cross
>> > platform differences, etc. Even Rust doesn't have a good cross-platform
>> > library for this type of message passing last time I asked (a few months
>> > ago I asked and was advised to use something like 0mq, which made me
>> > sad). Maybe there is a reasonable Python library we can vendor. But I
>> > suspect we'll find limitations in any implementation, as this is a
>> > subtly hard problem.
>>
>> Yeah, the problem is hard enough that I would rather have external
>> library dealing with it.
>>
>> --
>> Pierre-Yves David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20191129/b825f360/attachment.html>
More information about the Mercurial-devel
mailing list