Unified tests and graphlog output

Mon Sep 20 14:04:50 CDT 2010

On Mon, Sep 20, 2010 at 8:18 PM, Brodie Rao <brodie at bitheap.org> wrote:
> On Mon, Sep 20, 2010 at 12:48 PM, Adrian Buehlmann <adrian at cadifra.com> wrote:
>> I found a problem with unified tests containing glog calls (I noticed
>> this while trying to unify tests-rebase*).
>>
>> Assume having the following unified test file (test-test.t):
>>
>>
>>  $ echo "[extensions]" >> $HGRCPATH
>>  $ echo "graphlog=" >> $HGRCPATH
>>  $ hg init
>>  $ echo "foo" > a
>>  $ hg ci -A -m 1
>>  adding a
>>  $ echo "bar" >> a
>>  $ hg ci -m 2
>>  $ hg up 0
>>  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
>>  $ echo "bla" >> a
>>  $ hg ci -m 3
>>  created new head
>>
>>  $ hg glog --template '{rev}:{node|short} "{desc}"\n'
>>  @  2:20c4f79fd7ac "3"
>>  |
>>  | o  1:38f24201dcab "2"
>>  |/
>>  o  0:2a18120dc1c9 "1"
>>
>>
>> This all looks shiny and nice.
>>
>> But...
>>
>> Assume you edit the graphlog output part of that file to:
>>
>>
>>  $ hg glog --template '{rev}:{node|short} "{desc}"\n'
>>  @  2:20c4f79fd7ac "3"
>>  |
>>  | o  1:38f24201dcab "2" EVIL HACKERS WERE HERE
>>  |/
>>  o  0:2a18120dc1c9 "1"
>>
>>
>> Then running this test in a terminal goes like this:
>>
>>
>>  $ python run-tests.py test-test.t -i
>>  .
>>  # Ran 1 tests, 0 skipped, 0 failed.
>>
>>
>> Huh?
>>
>> The test doesn't detect that the command output does not match the
>> expected output?!
>>
>> The symptom is that iff there is a '|' character at the beginning of a
>> line, the rest of the line is irrelevant.
>>
>> Yikes.
>>
>> I haven't looked at the code of the test framework, but my first guess
>> is that we are hit here by the regular expression matching feature of
>> the unified tests.
>>
>> Even more evil: if -- alternatively -- you instead change the graphlog
>> part to:
>>
>>
>>  $ hg glog --template 'HACKED {rev}:{node|short} "{desc}"\n'
>>  @  2:20c4f79fd7ac "3"
>>  |
>>  | o  1:38f24201dcab "2"
>>  |/
>>  o  0:2a18120dc1c9 "1"
>>
>>
>> (note the "HACKED" inserted into the template option)
>>
>> and then accept the changed test file (test-test.t.err) after running
>> that test, you will end having:
>>
>>
>>  $ hg glog --template 'HACKED {rev}:{node|short} "{desc}"\n'
>>  @  HACKED 2:20c4f79fd7ac "3"
>>  |
>>  | o  1:38f24201dcab "2"
>>  |/
>>  o  HACKED 0:2a18120dc1c9 "1"
>>
>>
>> in your "test-test.t" file (note that "HACKED" has -not- been inserted
>> for node 1, because it is behind a '|' !).
>>
>> Can this be improved, so that unified tests are more graphlog-proof?
>
> You're right:
>
>  >>> import re
>  >>> re.match('|foo', 'bar')
>  <_sre.SRE_Match object at 0x1007f2ed0>
>
> This is a general problem of falling back to regular expression
> matching. It works both ways too; you could have a regular expression
> that literally matches the output before being compiled/processed.
>
> You could do one of the following:
>
> 1. Escape your graphlog output.
>
> 2. Require regular expressions to be specially marked. You'd have to
> update all the existing unified tests, but there's no chance you'd
> accidentally run into this issue again.
>
> #2 seems like the better long-term choice to me.

Agreed. Of course we'd have to escape output that conflicts with this
markup. I myself would prefer a trailing mark. So for instance:

  $ hg rebase
  rebasing...
  saved backup file to .* (re)
  $ echo '(re)'
  (re) (plain)

The idea being that if you find a closing ) on the line, you look for
the closest ( to its left and interpret the inner part as a directive.