Potential improvements to run-tests.py

Mon Oct 27 06:35:09 CDT 2014

On 10/26/2014 10:21 PM, Gregory Szorc wrote:
>>> * Ability to declare your own variables for substitutions. When
>>> comparing output, we substitute variables/strings like $HGPORT. These
>>> come from a hard-coded list. I want to make it possible for the test
>>> to define its own variables.
>> What syntax would you use for that? Do you see any use cases for that in
>> the Mercurial tests?
> I imagine we'd append "FOO=value" lines to a file and then parse said
> file from Python at the end of the test.
>
> Combined with HGPORT coming from the test execution itself, a lot of
> code and value currying could be removed from run-tests.py.
>
> The functionality could also be used for integration tests. Say you want
> to run tests against a staging instance of your Mercurial server. That
> URL is dynamic. With variables/replacements coming from the test itself,
> you can insert the dynamic URLs easily, without resorting to (glob) or (re).

I guess that would mean that a good first milestone could be to 
introduce this file (kind of reusing the existing pattern for 
DAEMON_PIDS) and seed it with the existing replacements (including HGPORTs).

If you introduce the feature of having custom replacements, please also 
take a quick sweep through the tests and start using the feature so we 
can benefit from it.

FWIW: I do not see a big problem with the existing HGPORTS handling. It 
pretty much works for our needs and I doubt it can be done any simpler. 
It will be interesting to see how you can improve it with something more 
powerful. I suggest you put that at the end of the patch series so it 
can be considered separately.

>> If using automatic listen port assignment, after the process (hgweb or
>> something else) has started listening on some automatic port, how would
>> you retrieve the port number and so other commands could use it,
>> efficiently and in shell script? It could of course be done, but it is
>> not completely trivial.
> Yeah, I'm not sure. Maybe we should add a back door to `hg serve` that
> writes out values to a file or makes them more easily parsed from the
> output (writing to fd4 or something).

You still need to get the port number into some variable so other 
commands can be told which port to connect to. For the shell tests that 
would be an environment variable. I don't see how it can be any simpler 
than just putting it there in the first place.

>
>> AFAICS, it would also require that substitutions should be dynamic and
>> change while the test output is processed. Again, more complexity, and
>> I'm not sure it is worth it.
> Substitutions only come into play after the test script has finished
> executing: output is not munged as it is generated. I think your concern
> assumes otherwise?

I guess that depends on exactly how you plan to do it.

I guess my concern here is what you seem to plan to address by writing 
an entry to the substitution file every time a new port has been used. 
That might be a oneliner, but yet another one, and one to repeat and 
execute in several places.

>
>> Instead, I would suggest that the runner should pre-scan the test text
>> for which ports a test needs, allocate random ports and verify they
>> really are free, set the environment variables, and finally fail the
>> tests if the ports still are in used after the test has completed.
>> (TIMEWAITish issues might make that tricky to implement reliably.)
> Scanning the test file for port requests is an interesting idea: I like it!
>
> It does need some thought around pre-allocation though. IMO the existing
> port allocation code is a mess. Having tests defer choosing a port
> number until execution time seems like the simplest approach for the
> test harness. But potentially more complicated for tests themselves.

In what way is it a mess?

Yes, it makes a gross approximation by assuming it is possible to 
"allocate" ports. TOCTOU issues are very possible. I agree the right 
solution to that is to let the listening process allocate a port ... but 
I don't think it is feasible. The cure seems worse than the problem.

IIRC we use ports that are outside the range most OSs will allocate 
dynamically. If not, we can change the port range or give advice on how 
to change the OS configuration to make it more reliable.

We could get better isolation of simultaneously running run-tests by 
introducing some simple locking scheme.

When on Linux (Unix?), we could get better lightweight isolation by 
using Jelmer's http://cwrap.org/ ... or use separate docker instances 
for each runner.

> Also, I *think* TIME_WAIT isn't a concern if you just bind() a TCP
> socket and don't accept() any connections.

I agree - it can't be TIME_WAIT without connections ... but think I have 
seen something like that on Windows. That might be pure FUD. Time will 
tell ;-)

PS: We are in a feature freeze and should be testing/fixing the RC 
instead of discussing future development ;-)

/Mads