[PATCH STABLE] tests: run test-check-code* last

Mon Sep 10 20:31:52 CDT 2012

Mads Kiilerich wrote:
> On 09/10/2012 05:14 AM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1347244676 14400
>> # Branch stable
>> # Node ID 299d1005b72ce3f803f6b1dff0a7788fc73710f0
>> # Parent 3ee5d3c372fabcf57c305835dac98da78bdc1837
>> tests: run test-check-code* last
>>
>> Part of check-code.py scans the output of the tests and helpfully
>> indicates that
>> (glob) needs to be used on paths for Windows compatibility. But since
>> the tests
>> are run in sorted order, the majority of the tests get run after
>> check-code and
>> will need a second run of the tests to pick up the problem.
>
> It is not clear to me what problem you are solving and how you are
> solving it by this change. Is the hidden use case that you are running
> the test suite with -i and only want to run it once?
>

Yes.  Sorry, I should have made that whole message more clear.

> I would argue that it is a good idea to run the test suite again after
> making any changes anyway.
>

Depends on the nature of the changes.  I see the test suite as a system 
to make sure output from hg doesn't change unexpectedly as the code 
evolves, and know almost nothing about implementation details. 
Therefore, it seems intuitive to me that I need to rerun the whole thing 
if an unexpected output change causes me to go back and change code- 
something somewhere else could have gotten broke.  But if I add a test, 
run the whole thing with -i and it looks correct upon reviewing it, it 
seems equally intuitive that I'm done- the output *should* be the same 
if the code doesn't change, with a few obvious exceptions.

Knowing this is a problem, I would agree that a rerun (of check-code) is 
a good idea.  But I'd guess that most people aren't aware of this issue 
until they run into it.

> It could also be argued that check-code by checking everything and
> "failing" quite often would be a good candidate for running very early.
>

Agreed, and I thought about running it first _and_ last.  But I figured 
it runs relatively early now, and I don't recall anyone else mentioning 
wanting it to run first, so I thought maybe it isn't that big of a 
concern.  Maybe it needs to be split into a style check for the code and 
the command lines of the tests, and then a check for the output lines of 
the test?  It seems that would be the best of both worlds (early style 
failure, late test cleanup), unless there's a lot of overlap.

[snip]

> It could be nice to have a mechanism for smart ordering of the tests.
> * tests that have failed recently should be run first.
> * "important" tests should be run first. I guess that some 5% of the
> tests covers 95% of the code and would catch 95% of the failures.
> * some dependencies. If the basic tests for some area fails then it
> probably would make less sense to test more complex scenarios in the
> same area.
> * tests should be scheduled evenly across multiple runners to ensure
> that all runners are kept busy until they all finish, for example by
> using execution times from last run and run slow tests first and fill up
> with faster tests.

Are the 'hg serve' commands sprinkled across various *.t going to get in 
the way of parallelizing tests?  I see there's a -j already, I'm just 
not sure how it protects against bind errors.

> That mechanism could also be used to run check-code early or late if we
> could agree on which is best. I don't think it is a good idea to
> hardcode a policy now.

Fair enough.  I figured this is probably a very specific instance of a 
general case that would be useful, but I hadn't thought of those points 
you raised.  Is there a general plan for doing any of this, or is it 
still being discussed?  (I think I vaguely remember a thread on the ML a 
few months back about balancing tasks across multiple runners).  This 
seems pretty ambitious and I don't know nearly enough python to pull 
this off.  But I could probably help with bite sized chunks if the path 
to getting there is fairly well known.

Since this started because of a (glob), I'm also wondering- it looks 
like the test suite will ignore diffs if (glob) lines have changed 
(obviously), but if some normal line also changes, it replaces the glob 
lines with the actual output.  Is this by design, or an implementation 
convenience?

--Matt