New testing framework

Mon Jun 14 08:35:16 CDT 2010

On Sat, Jun 12, 2010 at 8:52 AM, Adrian Buehlmann <adrian at cadifra.com> wrote:

[me, showing a rather hairy example from the bfiles test suite]
> if os.name == 'posix':
>    hgt.announce('bfupdate respects umask')
>    os.remove('sub/big3')
>    umask = os.umask(077)
>    try:
>        hgt.hg(['bfupdate', 'sub'],
>               stdout='1 big files updated, 0 removed\n')
>    finally:
>        os.umask(umask)
>    mode = os.stat('sub/big3').st_mode & 0777
>    hgt.assertequals(0700, mode, 'sub/big3: mode')

[Peter's reaction]
> I fully appreciate the need to get the tests running on Windows. But
> the above is simply too hard to read. One of my goals (and I guess
> Martin's) is to have examples in documentation that are actually
> tests. Which would again mean we need readable tests.

Guilty as charged.  I have concentrated on portability and
expressiveness rather than readability.  However...

[Adrian points out]
> I'm not sure how much that above testcase is representative for
> readability of the average testcase.

Exactly right.  That was a fairly hairy example that demonstrates the
power of testing in Python (direct access to system calls).

> How many of the tests deal with linux specific things like unix file modes?

Not many.  I've got ~900 lines of Python test code now and seven "if
os.name == 'posix'" checks, all related to Unix file permissions.

Here's a more mundane example, from bfiles' test-create.py.  The
original shell code:

"""
echo "% create and bfadd some 'big' files"
head -c100 /dev/zero > big1
head -c150 /dev/zero > big2
head -c200 /dev/zero > sub/big3

set -e
hg bfadd -v big1 big2

echo "% check files created"
hg bfstatus
hg status
listadmin
listpending
liststandins
"""

(listadmin, listpending, and liststandins are all shell functions that
basically do "find ... | sort".)

And here is the corresponding snippet of .out file:

"""
% create and bfadd some 'big' files
hg bfadd -v big1 big2
adding big1
adding big2
added 2 big files
% check files created
hg bfstatus
BPA big1
BPA big2
hg status
A .hgbfiles/big1
A .hgbfiles/big2
? sub/big3
contents of .hg/bfiles:
.hg/bfiles/dirstate
.hg/bfiles/latest/big1
.hg/bfiles/latest/big2
contents of .hg/bfiles/pending:
.hg/bfiles/pending
.hg/bfiles/pending/big1
.hg/bfiles/pending/big1/ed4a77d1b56a118938788fc53037759b6c501e3d
.hg/bfiles/pending/big2
.hg/bfiles/pending/big2/2894ac4ba83af7de1cdcef23d72e68aed68b6624
.hg/bfiles/committed: no such directory
contents of .hgbfiles:
.hgbfiles/big1
.hgbfiles/big2
"""

See, you're already getting whiplash from bouncing back and forth
between the shell script and the .out file.  Now here's how it looks
in Python based on hgtest:

"""
hgt.announce('create and add some "big" files')
hgt.writefile('big1', '\0' * 100, mode='wb')
hgt.writefile('big2', '\0' * 150, mode='wb')
hgt.writefile('sub/big3', '\0' * 200, mode='wb')

hgt.hg(['bfadd', '-v', 'big1', 'big2'],
       stdout='adding big1\nadding big2\nadded 2 big files\n')

hgt.announce('check files created')
hgt.hg(['bfstatus'],
       stdout='BPA big1\n'
              'BPA big2\n')
hgt.hg(['status'],
       stdout='A .hgbfiles/big1\n'
              'A .hgbfiles/big2\n'
              '? sub/big3\n')
hgt.assertadmin([
    'dirstate',
    'latest/big1',
    'latest/big2'])
hgt.assertpending([
    'big1',
    'big1/ed4a77d1b56a118938788fc53037759b6c501e3d',
    'big2',
    'big2/2894ac4ba83af7de1cdcef23d72e68aed68b6624',])
hgt.assertcommitted([])
hgt.assertstandins([
    'big1',
"""

To me, this is already much much easier to read, understand, and
modify than the shell code.  Here's where it gets interesting: when I
run the script (run-tests.py --local --debug), the output is clear and
it's pretty obvious that the test is passing:

"""
% create and add some "big" files
hg bfadd -v big1 big2

% check files created
hg bfstatus
hg status
"""

The current policy: silence is golden, except for calls to
hgt.announce() and hgt.hg().  It gets even better when I sabotage
things to cause failure:

"""
% create and add some "big" files
hg bfadd -v big1 big2
FAIL:
-- expected stdout: ----------------------------------------
adding big1
adding big2
added 2 big files
-- actual (filtered) stdout: -------------------------------
adding big1
added 1 big file
------------------------------------------------------------
failure context:
  File "/home/greg/src/hg-bfiles/tests/test-create.py", line 28, in <module>
    stdout='adding big1\nadding big2\nadded 2 big files\n')
FAIL:
-- expected stderr: ----------------------------------------
-- actual (filtered) stderr: -------------------------------
big2: No such file or directory
------------------------------------------------------------
failure context:
  File "/home/greg/src/hg-bfiles/tests/test-create.py", line 28, in <module>
    stdout='adding big1\nadding big2\nadded 2 big files\n')

"""

(What was that about "too verbose"?  ;-)

Note that we have two distinct failures here from a single "hg bfadd"
command, since the stdout and stderr were both different from my
expected values.  IMHO this is a great big huge win over the status
quo:

  * no need to diff .out files to see failure; it's right there in the
test output
  * it says "FAIL" in big obvious uppercase letters
  * it gives a fragmentary stack trace to show which line of test code
is failing

So I still like my approach, although it definitely needs work to
streamline the test code.  The maximum streamlining is probably
something like Martin suggested, where the test code is a new
mini-language that incorporates shell commands, expected output, and
Python assertions.  Or something like that.

> Windows testing is today probably mostly done by a single person:
> Patrick. I've mostly abandoned trying to run the testsuite myself on
> Windows. I use an ubuntu in a VM on my Windows 7 if I need to send a
> mercurial patch. The rest is manual testing various specific cases when
> I'm interested in a specific case. It's simply the line of least resistance.

That's the clincher for me.  If people hacking Mercurial on Windows
find it's just too damn hard to run the tests on Windows, then it
won't get tested on Windows, and we won't find out it's broken until
it's been released.  I fear and loathe Windows and everything it
stands for... but I still want my code to run on it.

Greg