Testing Mercurial with Hypothesis

Mon Feb 22 09:24:50 EST 2016

On 02/22/2016 03:15 PM, David MacIver wrote:
> Hi there,
>
> I write Hypothesis, a generative testing library for Python. I'm
> currently being contracted to do some testing of Mercurial using it.
>
> I've currently got a reasonable starting point for this, so this seems
> like a good time to throw it out into the wild for comments. Do bear in
> mind that this is very much a work in progress. You can see it at
> https://bitbucket.org/david_maciver_/mercurial/src/52e3b0d0fbb4d416b7d7936fd0f88dd61d3a158d/tests/test-verify-repo-operations.py?at=all&fileviewer=file-view-default
>
> What this does is use Hypothesis's stateful testing
> (http://hypothesis.readthedocs.org/en/release/stateful.html) to run
> various sequences of mercurial operations against a repo and see if they
> break. Some "breakages" are acceptable and ignorable, currently
> identified by what error message Mercurial emits, while others are bugs
> - either in Mercurial or (more often, for now) in the test.
>
> At the end it generates a file in Mercurial's .t format (using the
> --interactive mode. I tried writing a generator for this, but getting
> all the escaping right proved...challenging). If there were no
> unexpected errors in the run, it then runs that .t test against another
> version of Mercurial to compare.
>
> The original idea for this testing was to compare different
> configurations of Mercurial (e.g. pure vs C extensions), though with the
> design that I've ended up with it also usefully tests a single version
> of Mercurial as a byproduct.
>
> I'm working with Simon Farnsworth on this, and he ran it over the
> weekend on his development server and it seems to have produced at least
> one real bug (attached - Hypothesis normally minimizes better than this,
> but was running in a suboptimal configuration to do so. We have a much
> smaller manually minimized example too). I expect as people who have
> more Mercurial background than I do get involved and find more
> interesting things for it to test it will become even better at finding
> them.
>
> I'm open to and interested in comments in general, but I'm particularly
> interested in:
>
> 1. How would you like this integrated, if indeed it is a thing people
> are interested in seeing in core mercurial at all? Is the prospect of
> running it as part of run-tests interesting, or would you prefer a
> different approach? In particular, what should be done with the
> generated failures?

I think that running them often is valuable. We have a "long running" 
test concept, disabled by default. If we have them there and buildbot 
running them would be nice.

Having some easy way to start 100 exploration in parallel for a couple 
of hours could probably be valuable too.

Finally I'm sure we want to make sure any bug it find stay covered. I'm 
not sure if we should write a dedicated manual tests case for each of 
them or rely on hypothesis to save these case and run them during any 
normal run-tests.py.

> 2. Interesting suggestions for what to test, both specific operations
> and versions to compare.
>
> This is being tracked at
> https://www.mercurial-scm.org/wiki/HypothesisPlan. I'll try to keep it
> up to date with any suggestions (it needs a bit of updating already from
> the current WIP), but feel free to add to it directly.

-- 
Pierre-Yves David