Opinion needed: multiprocessing usage

Gregory Szorc gregory.szorc at gmail.com
Fri Nov 29 14:51:30 EST 2019



> On Nov 29, 2019, at 11:46, Augie Fackler <raf at durin42.com> wrote:
> 
> 
> 
> 
>> On Fri, Nov 29, 2019, 06:45 Pierre-Yves David <pierre-yves.david at ens-lyon.org> wrote:
>> 
>> 
>> On 11/12/19 4:35 AM, Gregory Szorc wrote:
>> > On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <raf at durin42.com 
>> > <mailto:raf at durin42.com>> wrote:
>> > 
>> >     (+indygreg)
>> > 
>> >      > On Nov 11, 2019, at 03:04, Pierre-Yves David
>> >     <pierre-yves.david at ens-lyon.org
>> >     <mailto:pierre-yves.david at ens-lyon.org>> wrote:
>> >      >
>> >      > Hi everyone,
>> >      >
>> >      > I am looking into introducing parallelism into `hg
>> >     debugupgraderepo`. I already have a very useful prototype that
>> >     precompute in // copies information when converting to side-data
>> >     storage. That prototype use multiprocessing because it is part of
>> >     the stdlib and work quite well for this usecase.
>> >      >
>> >      > However, I know we refrained to use multiprocessing in the past.
>> >     I know the import and boostrap cost was to heavy for things like `hg
>> >     update`. However, I am not sure if there are other reason to rule
>> >     out the multiprocessing module in the `hg debugupgraderepo` case.
>> > 
>> >     I have basically only ever heard bad things about multiprocessing,
>> >     especially on Windows which is the platform where you'd expect it to
>> >     be the most useful (since there's no fork()). I think Greg has more
>> >     details in his head.
>> > 
>> >     That said, I guess feel free to experiment, in the knowledge that it
>> >     probably isn't significantly better than our extant worker system?
>> > 
>> > 
>> > multiprocessing is a pit of despair on Python 2.7. It is a bit better on 
>> > Python 3. But I still don't trust it. I think you are better off using 
>> > `concurrent.futures.ProcessPoolExecutor`.
>> 
>> That looks great, but this is not available in python-2.7
> 
> 
> There's a backport of the 3.x concurrent futures available on pypi, and AIUI it fixes some important bugs in the package that didn't ever land in 2.x. 

We have it vendored :)

Only used on Python 2 via pycompat shim IIRC.

> 
>> 
>> > But I'm not even sure I trust ProcessPoolExecutor on Windows, especially 
>> > when `sys.executable` is `hg.exe` instead of `python.exe`: I think both 
>> > multiprocessing and concurrent.futures make assumptions about how to 
>> > invoke the "run a worker" code on a new process that is invalidated when 
>> > the main process isn't `python.exe`.
>> 
>> That's unfortunate :-/ Any way to reliably test this and get it fixed 
>> upstream ?
>> 
>> > So I think we may have to roll our own "start a worker" code. The 
>> > solution that's been bouncing around in my head is to add a `hg 
>> > debugworker` command (or similar) that dispatches work read from a 
>> > pipe/file descriptor/temp file to a named <module>.<function> callable. 
>> > When then implement a custom executor conforming to the interface that 
>> > concurrent.futures wants and we use that for work dispatch. One of the 
>> > hardest parts here is implementing a fair work scheduler. There are all 
>> > kinds of gnarly problems involving buffering, permissions, cross 
>> > platform differences, etc. Even Rust doesn't have a good cross-platform 
>> > library for this type of message passing last time I asked (a few months 
>> > ago I asked and was advised to use something like 0mq, which made me 
>> > sad). Maybe there is a reasonable Python library we can vendor. But I 
>> > suspect we'll find limitations in any implementation, as this is a 
>> > subtly hard problem.
>> 
>> Yeah, the problem is hard enough that I would rather have external 
>> library dealing with it.
>> 
>> -- 
>> Pierre-Yves David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20191129/b825f360/attachment.html>


More information about the Mercurial-devel mailing list