Windows test suite timeouts
Matt Mackall
mpm at selenic.com
Wed Jan 9 17:46:34 CST 2013
On Tue, 2013-01-08 at 00:33 -0500, Matt Harbison wrote:
> Matt Mackall wrote:
> > On Sat, 2013-01-05 at 21:35 -0500, Matt Harbison wrote:
> >> Mads Kiilerich wrote:
> >>> Dirkjan Ochtman wrote, On 01/05/2013 10:40 AM:
> >>>> On Fri, Jan 4, 2013 at 4:06 AM, Matt Harbison
> >>>> <matt_harbison at yahoo.com> wrote:
> >>>>> Every so often, I've noticed that some tests on Windows will take a
> >>>>> really
> >>>>> long time, and then timeout:
> >>>>>
> >>>>> $ python run-tests.py -i test-largefiles.t
> >>>>>
> >>>>> ERROR: c:\Users\Matt\Projects\hg\tests\test-largefiles.t timed out
> >>>>> t
> >>>>> Failed test-largefiles.t: timed out
> >>>>> # Ran 1 tests, 0 skipped, 1 failed.
> >>>> Actually, on Gentoo, test-largefiles.t and test-mq.t have been timing
> >>>> out for a bunch of users, so I'm guessing that test's problems aren't
> >>>> only on Windows.
> >>> These tests _are_ big and will reach the timeout limit on slow or
> >>> overloaded hardware. The timeout had to be increased on some of the
> >>> buildbots.
> >>>
> >>> In both cases: Are you sure the tests really are hanging, or are they
> >>> just too slow?
> >>>
> >>> /Mads
> >> It looks like that was the problem:
> >>
> >> $ time python run-tests.py -t 600 test-largefiles.t
> >> .
> >> # Ran 1 tests, 0 skipped, 0 failed.
> >>
> >> real 3m58.038s
> >> user 0m0.000s
> >> sys 0m0.062s
> >>
> >> I'm really surprised it is that high above the 3 minute default timeout,
> >> since it works so consistently until it fails consistently.
> >
> > Perhaps you could time 20 runs in a row from a cold boot to look for a
> > trend.
> >
> >> But this pretty much tracks with what I've seen- happens after a long
> >> uptime, clears on reboot, and ran 3 times in a row with the raised
> >> timeout (and both of your patches applied). I'm not sure why it works
> >> in a Linux virtual box on the same machine, or why the full .t.err is
> >> generated, but at least we got to the bottom of it.
> >
> > Have you looked at how long it takes to run this test on Linux?
> >
> > $ time ./run-tests.py -l test-largefiles.t
> > .
> > # Ran 1 tests, 0 skipped, 0 failed.
> >
> > real 0m32.142s
> > user 0m19.401s
> > sys 0m7.680s
> >
> > That's on a virtual machine on the same box that serves
> > mercurial.selenic.com, which is under a steady load of mail and web
> > traffic. It's running on a $400 machine I built in 2008:
> >
> > model name : Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
> >
> > That's the slowest thing I've got convenient access to. If I had my
> > Raspberry Pi plugged in, it would probably also smoke Windows.
> >
> > People assume that Windows and OS X and Linux are roughly comparable in
> > performance. It just ain't so, folks. Linux absolutely murders the other
> > two on fork/exec, syscall, and filesystem lookup intensive benchmarks,
> > which is what our test-suite amounts to.
> >
>
> For comparison purposes, I ran the tests 15 or so times on Windows and
> on Linux prior to rebooting. Linux was tightly clustered (all were
> greater than 58s and less than 60s, except 2 at 64s and one at 57s),
> like so:
>
> real 0m58.617s
> user 0m42.814s
> sys 0m12.179s
>
> Windows was scattered between 3m21.422s and 3m28.857s, with an outlier
> at 3m36.583s before the reboot. The only difference from the 3m58 above
> is I killed Firefox before these tests, which was using 1Gb+ of memory
> (though 4Gb of physical memory was still free).
>
> After rebooting, Windows was quicker, but still all over the place (and
> the first run _did_ timeout when I forgot to raise the value):
>
> real 3m17.855s
> real 3m7.154s
> real 3m4.096s
> real 3m8.339s
> real 3m11.101s
> real 3m5.750s
> real 3m8.136s
> real 3m17.137s
> real 3m14.486s
> real 3m17.247s
> real 3m15.250s
> real 3m8.901s
> real 3m19.150s
> real 3m14.126s
> real 3m10.976s
> real 3m12.504s
> real 3m12.349s
> real 3m10.663s
> real 3m12.894s
> real 3m13.784s
>
> Linux after the reboot was slightly (~4s) quicker and just as tight
> (greater than 54.7s and less than 55.8s). I then rebooted the Linux
> virtualbox (usually I just save the state when quitting), and the time
> fell more (greater than 41.2s and less than 42.3s, except two runs at
> 44s). This I think seems reasonable with your VM results, since I have
> a Core i7 Q840 @ 1.67Ghz, and the difference from your results is ~10s.
Yeah, it's comparable. The faster-after-reboot thing is surprising
though. Unless there's memory pressure in the VM, it should be pretty
constant.
> So I'm not sure if there's any conclusion that can be drawn about the
> test suite itself, other than maybe the default timeout should be raised
> (maybe conditionally for Windows?) I know an environment variable can
> be set to override, but others likely won't know about it or set it
> until they run into this and spend time investigating.
Looking at the buildbot, we use a 300s (5m) timeout on OS X and a 1200s
(!) timeout on Win 2008. Both take about an hour to get through the test
suite. We should probably bump the default up.
Alternately, we should be trimming some fat on the largefiles test.
--debug might be useful here in spotting what the slow bits are.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list