Windows test suite timeouts

Wed Jan 9 17:46:34 CST 2013

On Tue, 2013-01-08 at 00:33 -0500, Matt Harbison wrote:
> Matt Mackall wrote:
> > On Sat, 2013-01-05 at 21:35 -0500, Matt Harbison wrote:
> >> Mads Kiilerich wrote:
> >>> Dirkjan Ochtman wrote, On 01/05/2013 10:40 AM:
> >>>> On Fri, Jan 4, 2013 at 4:06 AM, Matt Harbison
> >>>> <matt_harbison at yahoo.com>  wrote:
> >>>>> Every so often, I've noticed that some tests on Windows will take a
> >>>>> really
> >>>>> long time, and then timeout:
> >>>>>
> >>>>> $ python run-tests.py -i test-largefiles.t
> >>>>>
> >>>>> ERROR: c:\Users\Matt\Projects\hg\tests\test-largefiles.t timed out
> >>>>> t
> >>>>> Failed test-largefiles.t: timed out
> >>>>> # Ran 1 tests, 0 skipped, 1 failed.
> >>>> Actually, on Gentoo, test-largefiles.t and test-mq.t have been timing
> >>>> out for a bunch of users, so I'm guessing that test's problems aren't
> >>>> only on Windows.
> >>> These tests _are_ big and will reach the timeout limit on slow or
> >>> overloaded hardware. The timeout had to be increased on some of the
> >>> buildbots.
> >>>
> >>> In both cases: Are you sure the tests really are hanging, or are they
> >>> just too slow?
> >>>
> >>> /Mads
> >> It looks like that was the problem:
> >>
> >>     $ time python run-tests.py -t 600 test-largefiles.t
> >>     .
> >>     # Ran 1 tests, 0 skipped, 0 failed.
> >>
> >>     real    3m58.038s
> >>     user    0m0.000s
> >>     sys     0m0.062s
> >>
> >> I'm really surprised it is that high above the 3 minute default timeout,
> >> since it works so consistently until it fails consistently.
> >
> > Perhaps you could time 20 runs in a row from a cold boot to look for a
> > trend.
> >
> >> But this pretty much tracks with what I've seen- happens after a long
> >> uptime, clears on reboot, and ran 3 times in a row with the raised
> >> timeout (and both of your patches applied).  I'm not sure why it works
> >> in a Linux virtual box on the same machine, or why the full .t.err is
> >> generated, but at least we got to the bottom of it.
> >
> > Have you looked at how long it takes to run this test on Linux?
> >
> > $ time ./run-tests.py -l test-largefiles.t
> > .
> > # Ran 1 tests, 0 skipped, 0 failed.
> >
> > real	0m32.142s
> > user	0m19.401s
> > sys	0m7.680s
> >
> > That's on a virtual machine on the same box that serves
> > mercurial.selenic.com, which is under a steady load of mail and web
> > traffic. It's running on a $400 machine I built in 2008:
> >
> >   model name	: Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz
> >
> > That's the slowest thing I've got convenient access to. If I had my
> > Raspberry Pi plugged in, it would probably also smoke Windows.
> >
> > People assume that Windows and OS X and Linux are roughly comparable in
> > performance. It just ain't so, folks. Linux absolutely murders the other
> > two on fork/exec, syscall, and filesystem lookup intensive benchmarks,
> > which is what our test-suite amounts to.
> >
> 
> For comparison purposes, I ran the tests 15 or so times on Windows and 
> on Linux prior to rebooting.  Linux was tightly clustered (all were 
> greater than 58s and less than 60s, except 2 at 64s and one at 57s), 
> like so:
> 
> real	0m58.617s
> user	0m42.814s
> sys	0m12.179s
> 
> Windows was scattered between 3m21.422s and 3m28.857s, with an outlier 
> at 3m36.583s before the reboot.  The only difference from the 3m58 above 
> is I killed Firefox before these tests, which was using 1Gb+ of memory 
> (though 4Gb of physical memory was still free).
> 
> After rebooting, Windows was quicker, but still all over the place (and 
> the first run _did_ timeout when I forgot to raise the value):
> 
> real    3m17.855s
> real    3m7.154s
> real    3m4.096s
> real    3m8.339s
> real    3m11.101s
> real    3m5.750s
> real    3m8.136s
> real    3m17.137s
> real    3m14.486s
> real    3m17.247s
> real    3m15.250s
> real    3m8.901s
> real    3m19.150s
> real    3m14.126s
> real    3m10.976s
> real    3m12.504s
> real    3m12.349s
> real    3m10.663s
> real    3m12.894s
> real    3m13.784s
> 
> Linux after the reboot was slightly (~4s) quicker and just as tight 
> (greater than 54.7s and less than 55.8s).  I then rebooted the Linux 
> virtualbox (usually I just save the state when quitting), and the time 
> fell more (greater than 41.2s and less than 42.3s, except two runs at 
> 44s).  This I think seems reasonable with your VM results, since I have 
> a Core i7 Q840 @ 1.67Ghz, and the difference from your results is ~10s.

Yeah, it's comparable. The faster-after-reboot thing is surprising
though. Unless there's memory pressure in the VM, it should be pretty
constant.

> So I'm not sure if there's any conclusion that can be drawn about the 
> test suite itself, other than maybe the default timeout should be raised 
> (maybe conditionally for Windows?)  I know an environment variable can 
> be set to override, but others likely won't know about it or set it 
> until they run into this and spend time investigating.

Looking at the buildbot, we use a 300s (5m) timeout on OS X and a 1200s
(!) timeout on Win 2008. Both take about an hour to get through the test
suite. We should probably bump the default up.

Alternately, we should be trimming some fat on the largefiles test.
--debug might be useful here in spotting what the slow bits are.

-- 
Mathematics is the supreme nostalgia of our time.