buildbot failure in Mercurial on OS X 10.7 hg tests

Wed Sep 19 18:04:41 CDT 2012

On Wed, Sep 19, 2012 at 6:00 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Wed, 2012-09-19 at 08:52 -0700, hgbuildbot at kublai.com wrote:
>> The Buildbot has detected a new failure on builder OS X 10.7 hg tests while building hg.
>> Full details are available at:
>>  http://hgbuildbot.kublai.com/builders/OS%20X%2010.7%20hg%20tests/builds/254
>>
>> Buildbot URL: http://hgbuildbot.kublai.com/
>
> I probably stopped looking at these over a month ago because they were
> too noisy.
>
> In my opinion, the primary benefit of the buildbot (and the test suite
> in general) is spotting _regressions_. Tests that are unstable (even
> because of real bugs, like is probably the case here) cause more harm
> than good here: I'm not going to fix the bug personally and if I have to
> click through three steps to discover that a particular report is just
> largefiles flapping in the breeze again, I'm not going to bother and I'm
> not going to notice real regressions that matter to me. I suspect the
> same is true for other people around here too.
>
> Similarly, test failures that turn a column red constantly for weeks are
> also harmful because they'll mask the appearance of new regressions we
> haven't heard about. The snare has already been sprung, it needs to be
> reset before it's going to catch any more issues.
>
> Thus, we need the normal, every day state of the buildbot to be "all
> green" regardless of whether that corresponds with "all tests pass on
> all systems". The latter isn't all that meaningful anyway, given that we
> have ~300 open bugs on the BTS to tell us everything is not perfect
> everywhere anyway.
>
> So what should we do? I think the right thing to do with any flaky or
> consistently failing buildbot issue is to a) file a bug and b) blacklist
> the test until some effort has been made to address it so buildbot
> results can go back to being green.
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

Sounds like a reasonable course of action to me.  The tricky part will
be determining whether a given failure is a regression versus a flaky
test, so you don't accidentally sweep detected regressions under the
rug.  In most cases (such as this one), I suspect that the
determination can be made quickly and fairly accurately just with a
glance at the failed test's output, assuming that the determination
(and subsequent bug creation/blacklisting) is being performed by a
person as opposed to a script.

-- 
David M. Carr
david at carrclan.us