D5288: tests: make test-alias.t pass with re2

Yuya Nishihara yuya at tcha.org
Tue Nov 20 07:18:00 EST 2018


Queued, thanks.

On Mon, 19 Nov 2018 18:40:57 +0000, valentin.gatienbaron (Valentin Gatien-Baron) wrote:
>   
>   $ python -c 'import re; print(re.compile("(.*)").match("aaa\xc0bbbb").groups())'
>   ('aaa\xc0bbbb',)
>   $ python -c 'import re2; print(re2.compile("(.*)").match("aaa\xc0bbbb").groups())'
>   ('aaa',)
>   
>   Apparently re2 stops when it encounters invalid utf8 (which I suppose makes sense
>   given that '.' matches what appears to be a codepoint rather than a byte). This is
>   presumably a bug in hg, but not very important, so just change the test to stick
>   to valid utf8.

So, if re2 assumes input bytes as UTF-8, we shouldn't use re2 where non-ASCII
input may exist. '\xc0' is a valid latin-1 character.


More information about the Mercurial-devel mailing list