D5288: tests: make test-alias.t pass with re2
Yuya Nishihara
yuya at tcha.org
Tue Nov 20 07:18:00 EST 2018
Queued, thanks.
On Mon, 19 Nov 2018 18:40:57 +0000, valentin.gatienbaron (Valentin Gatien-Baron) wrote:
>
> $ python -c 'import re; print(re.compile("(.*)").match("aaa\xc0bbbb").groups())'
> ('aaa\xc0bbbb',)
> $ python -c 'import re2; print(re2.compile("(.*)").match("aaa\xc0bbbb").groups())'
> ('aaa',)
>
> Apparently re2 stops when it encounters invalid utf8 (which I suppose makes sense
> given that '.' matches what appears to be a codepoint rather than a byte). This is
> presumably a bug in hg, but not very important, so just change the test to stick
> to valid utf8.
So, if re2 assumes input bytes as UTF-8, we shouldn't use re2 where non-ASCII
input may exist. '\xc0' is a valid latin-1 character.
More information about the Mercurial-devel
mailing list