run-tests.py refactor
Sean Farley
sean.michael.farley at gmail.com
Tue May 13 19:45:50 CDT 2014
Gregory Szorc <gregory.szorc at gmail.com> writes:
> On 5/10/2014 12:14 PM, Gregory Szorc wrote:
>> On 4/25/2014 9:30 PM, Yuya Nishihara wrote:
>>> On Fri, 25 Apr 2014 14:46:01 -0700, Gregory Szorc wrote:
>>>> On 4/25/2014 6:10 AM, Martin Geisler wrote:
>>>>> Gregory Szorc <gregory.szorc at gmail.com> writes:
>>>>>
>>>>>> On my i7-2600K mech HD w/ -l -j8:
>>>>>>
>>>>>> Original:
>>>>>>
>>>>>> real 6m40.558s user 31m31.405s sys 17m48.862s
>>>>>
>>>>> I normally run with -j 20 and get a real time of 4m24s. I find that
>>>>> there's so much IO going on that I can start that many processes in
>>>>> parallel to really saturate the CPU (maybe -j 15 would be the
>>>>> same).
>>>>
>>>> I created a tmpfs to isolate the impact of I/O from my mechanical HD.
>>>> My 4+4 cores were largely saturated during -j8. There were only brief
>>>> periods (maybe ~20s total) where idle CPU was below 5%. Raising it to
>>>> -j10 made the lulls go away.
>>>>
>>>> Total execution time on the tmpfs was about the same as the mechanical
>>>> HD. I can only conclude that the tests are CPU bottlenecked. My
>>>> previous measurements indicate the bulk of that is inside `hg`.
>>>>
>>>> I may very well write a feature that invokes the Python profiler on
>>>> all the `hg` invocations and sums the results. Could be interesting.
>>>
>>> IIRC, slow module lookup of Python contributes to the startup time of `hg`
>>> process. It might be partly an I/O bottleneck.
>>
>> Further measurement indicates that spawning Python processes and
>> importing the mercurial modules introduces significant overhead to
>> running tests.
>>
>> I ran `time python -c '1'` and `time python hg version` to measure the
>> overhead of new Python processes and the most simple Mercurial command,
>> respectively. I tested things on my desktop Sandy Bridge i7-2600K (few
>> years old now) running Ubuntu 14.04 and a more modern 2013 Haswell MBP.
>> I ran things several times and obtained the median. I didn't use an
>> hgrc, so no extensions were loaded.
>>
>> Machine Distribution Python Time HG Time
>> desktop cpython 2.7.6 40ms 87ms
>> desktop cpython 3.3.5 60ms N/A
>> MBP cpython 2.7.6 16ms 44ms
>> MBP cpython 3.3.5 27ms N/A
>>
>> (Note that 3.3.5 is slower than 2.7.6!)
>>
>> We have nearly 13,000 `hg` invocations in our .t tests. If you
>> extrapolate things out, the overhead for new Python processes is thus:
>>
>> Machine Distributions Python HG
>> desktop cpython 2.7.6 520s 1,131s
>> desktop cpython 3.3.5 780s N/A
>> MBP cpython 2.7.6 208s 572s
>> MBP cpython 3.3.5 351s N/A
>>
>> We can then compare these numbers to the total Mercurial test suite
>> execution time and see what the overhead of new Python processes and new
>> HG processes are.
>>
>> Machine Test CPU Time Python % HG %
>> desktop 2,960s 17.6 38.2
>> MBP 1,855s 11.2 30.8
>>
>> It certainly appears that cpython process overhead and the overhead of
>> "initializing" the Mercurial environment within a fresh Python
>> interpreter are contributing a significant amount of overhead to the
>> Mercurial test suite.
>>
>> Inlining tests into Python instead of shell like my patch series is
>> attempting or using chg should significantly speed up the test suite.
>>
>> Activating my regular hgrc file slowed down `hg version` by ~40ms on my
>> desktop machine. That is statistically significant.
>>
>> On my desktop machine, Perl 5.18.2 runs in 15ms, Ruby 1.9.3p484 in 27ms,
>> PHP 5.5.9 in 50ms, and Git 1.8.1.2 runs in 12ms.
>>
>> The Python new process overhead could very well be contributing to the
>> perception that Mercurial is slow[er than Git]. You need <25ms latency
>> for commands to feel instantaneous. We are already beyond that for the
>> most trivial Mercurial commands.
>
> I sent these observations/complaints to python-dev [1]. The IMO most
> useful response so far is [2], with its reference to PEP 432 [3] and its
> hopeful incorporation into Python 3.5.
>
> python -S does drop process overhead significantly (by disabling site.py
> processing). We could theoretically add that to hg's shebang. But we'd
> likely have to reimplement a lot of site.py's functionality. It would
> make a good experiment. chg is arguably a better solution.
It's a shame that cpython doesn't do a daemon mode automatically since
that would help *any* command-line (especially one in the prompt) python
program.
More information about the Mercurial-devel
mailing list