run-tests.py refactor

Thu Apr 24 21:08:11 CDT 2014

On 04/24/2014 02:42 PM, Gregory Szorc wrote:
> I'm working on a significant refactor of run-tests.py. My work can be
> found in the indygreg/run-tests-refactor bookmark of
> https://bitbucket.org/indygreg/hg. I plan to send the patches to this
> list once the 3.0 window is behind us. But I wanted to give people a
> heads up in case there are comments before the patch bombing.

Nice to see people showing interest in the test suite!. Are you aware 
that Anurag Goel will spend his Gsoc working on it? You guys should 
probably talk and synchronize.

> There are two overarching goals to this project:
>
> 1) Make the Mercurial tests more embeddable and consumable from external
> testing frameworks.

Sound a good idea. This should made use able to level existing tool 
suite for out own need.

> 2) Make the tests faster.

Noble goal. More core dev have access to gigantic machine to run the 
test-suite. But having it faster would help new contributors, offline 
work and still gains a lot of time to core dev.

> As a side-effect of both, I believe I've cleaned up run-tests.py
> significantly and made it easier to understand and hack on.
>
> First, as someone who writes a lot of 3rd party extensions and hooks, I
> find it difficult to integrate run-tests.py into my testing processes.
> Specifically, run-tests.py doesn't play nice with existing Python
> testing tools, such as unittest and nose. Every time I set up a new
> testing environment, I have to reinvent the wheel or copy some code.
> Things are fragile and I dare say the barrier to entry is high enough
> that it discourages testing. Testing should be encouraged by making it
> turnkey.
>
> To facilitate easier testing, my patch series converts run-tests.py to
> use the Python standard library's unittest package for declaring and
> running tests. Individual test cases are unittest.TestCase instances.
> There is a custom unittest.TestSuite that knows how to run a collection
> of Mercurial tests. There is a custom unittest.TestRunner that knows how
> to output results that should be identical to what run-tests.py outputs
> today. The goal is that external testers can instantiate these custom
> classes and easily plug them into existing testing tools, thus lowering
> the barrier to testing.

Cool, where you able to cleanly preserve the parallel run ability? Did 
you measured any overhead by using the unittest module (not that I 
expect one, but wondering)?

> Refactoring run-tests.py to work in a unittest world had the beneficial
> side-effect of forcing the code to become more... robust. Reliance on
> global variables has been nearly eliminated. The code for parsing and
> executing .t tests now lives in a single class and is easily importable.
> (This may make https://bitbucket.org/brodie/cram unnecessary.) Things
> like temporary directories are managed via unittest primitives such as
> setUp() and tearDown(). Code for handling failures is streamlined, etc.
> IMO it's a long overdue cleanup.

w00t, cool and thanks. The unittest module definitely needed some love.

> The execution time of the Mercurial tests is a common complaint. I did
> some profiling and determined that the tests were spending an awful lot
> of time in overhead of invoking the "hg" process. By executing tests in
> shells, we have to incur the process start-up and repository
> "re-association" costs for every invocation of "hg." The overhead is
> significant.
>
> I added an experimental "pysh" mode that parses shell commands into
> Python functions. If a .t file consists of only shell commands that can
> be inlined to Python, the test executes in pure Python. For "hg" calls,
> it creates a new mercurial.dispatch.request and calls
> mercurial.dispatch.dispatch().

Note that we could also use the command server + chg for that. This 
would have the cool effect to test the commands server too. Have you 
tried in path?

> The results of inlining .t tests into Python is *very* encouraging.
> test-bisect2.t (the largest test in terms of file size that can be
> inlined) drops from ~14s wall to ~1.8s wall (time ./run-tests.py -l
> --pysh test-bisect2.t)! Even a small test like test-resolve.t drops from
> ~1.6s to ~0.7s. That's the good news.

That is pretty impression.

>
> The bad news is that only ~54 of the ~425 existing .t tests can be
> executed in pure Python. And, even with inlining, total wall time
> execution for the entire test suite only dropped by 20-30s (i7-2600K - 4
> + 4 core -j8). The reason we can't inline more tests is because the
> tests are doing things with the shell that can't yet be parsed into
> Python functions.

20-30 secondes for what total time ?

Could we imagine an hybrid approach? where we stay in python as much as 
possible and fallback to shell when needed?

> This brings us to an interesting crossroads. I've identified separate
> process overhead of commands inside tests (notably hg invocations) as a
> significant factor contributing to slow test execution. I can entertain
> the argument that invoking hg from the same process multiple times
> instead of from isolated processes does taint the effectiveness of the
> test (we're not measuring real world conditions any more). But, I think
> hooking in at mercurial.dispatch.dispatch() - effectively what "hg" does
> - isn't a significant departure. And, since it buys us a massive speedup
> win, it's hard to ignore that benefit. While mpm and crew may insist on
> running tests in shell mode for official acceptance testing, developers
> would greatly benefit from the "99% accurate" pure Python mode (I think).

I think is will be very sensible to have a "fast" mode for tests that 
catch 90% of the issue we can meet. Built bot and final acceptance test 
can still be run on the shell calling version. But if the in process 
version have a good enough behavior it seems a bad idea to ignore it.

I personally alway run test with --local to shave a couple of seconds 
during development on my laptop. Then push a 64 core power 7 to run the 
whole test suite before patch bombing.

> Assuming there is buy-in to executing tests in pure Python, complex
> shell commands will continue to undermine the efficiency of the test
> suite. We have a number of options here:
>
> 1) Start rewriting .t tests to the subset of shell we can convert to Python
> 2) Support more shell primitives in Python (this becomes hard fast and I
> don't like parsing shell for various reasons)
> 3a) Establish separate, independent sections in .t tests and allow mixed
> mode execution
> 3b) Split .t tests into multiple files
> 4) Establish a new test syntax for denoting Python commands. e.g. "%
> mkdir foo" would be "execute this Python function with arguments." With
> this approach, we could convert Python "commands" into shell and execute
> in shell mode. I think that's easier than parsing shell.

I think that mixed mode it the way to go (3a). Then we have a few shell 
call that are ubiquitous we could probably offer python version for 
them. (mix of (2) and (4))

> These solutions all require a significant amount of effort. And, since
> my patch series so far has focused on maintaining backwards
> compatibility, I didn't want to start down a potentially dead-end project.
>
> You can make the argument that instead of inlining tests into pure
> Python we should be making hg process invocation faster. I agree that
> would be a worthwhile effort. However, no matter how efficient you make
> hg process invocation, it will still be slower than reusing an existing
> Python process. Thus, inline Python tests will always be faster than
> shell tests. Given the number of Mercurial tests, I can't imagine that
> difference being less than 20+ seconds and thus will always be relevant
> to developers wanting to quickly iterate. I therefore argue that
> investment in pure Python tests is not misplaced.

Faster and better test suite is never misplaced investment. It is just 
rarely considered in the usual emergency of things,

> Anyway, I just wanted to give people context before the massive patch
> bomb arrives. I hope you find this work beneficial. Hopefully it can be
> used to power a 5x faster test suite in the not so distant future.

Gook luck on this quest. Make sure you get in touch with Anurag (Gsoc 
student) he can probably give you an hand on multiple issue.

-- 
Pierre-Yves