[PATCH STABLE] tests: escape bytes setting MSB in input of grep for portability

Augie Fackler raf at durin42.com
Mon May 23 15:13:13 EDT 2016


On Sat, May 21, 2016 at 02:53:51AM +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1463766531 -32400
> #      Sat May 21 02:48:51 2016 +0900
> # Branch stable
> # Node ID 8c5e880c7e25e94354d312d582d2ba19ca419423
> # Parent  854556c5f3bf6493a99481a355c5112b2ea0ed37
> tests: escape bytes setting MSB in input of grep for portability

queued for stable, thanks

>
> GNU grep (2.21-2 or later) assumes that input is encoded in LC_CTYPE,
> and input is binary if it contains byte sequence not valid for that
> encoding.
>
> For example, if locale is configured as C, a byte setting most
> significant bit (MSB) makes such GNU grep show "Binary file <FILENAME>
> matches" message instead of matched lines unintentionally.
>
> This behavior is recognized as a bug, and fixed in GNU grep 2.25-1 or
> later. But some distributions are shipped with such buggy version
> (e.g. Ubuntu xenial, which is used by launchpad buildbot).
>
>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19230
>     https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800670
>     http://packages.ubuntu.com/xenial/grep
>
> This causes failure of test-commit-interactive.t, which applies grep
> on CP932 byte sequence since 1111e84de635.
>
> But, explicit setting LC_CTYPE for CP932 might cause another problem,
> because it can't be assumed that all environment running Mercurial
> tests allows arbitrary locale setting.
>
> To resolve this issue, this patch escapes bytes setting MSB in input
> of grep.
>
> For this purpose:
>
>   - str.encode('string-escape') isn't useful, because it escapes also
>     control code (less than 0x20), and makes EOL handling complicated
>
>   - "f --hexdump" isn't useful, because it isn't line-oriented
>
>   - "sed -n" seems reasonable, but "sed" itself sometimes causes
>     portability issue, too (e.g. 900767dfa80d or afb86ee925bf)
>
> This patch is posted with "stable" flag, because 1111e84de635 is on
> stable branch.
>
> diff --git a/tests/test-commit-interactive.t b/tests/test-commit-interactive.t
> --- a/tests/test-commit-interactive.t
> +++ b/tests/test-commit-interactive.t
> @@ -895,11 +895,24 @@ This tests that translated help message
>    $ LANGUAGE=ja
>    $ export LANGUAGE
>
> -  $ hg commit -i --encoding cp932 2>&1 <<EOF | grep '^y - '
> +  $ cat > $TESTTMP/escape.py <<EOF
> +  > from __future__ import absolute_import
> +  > import sys
> +  > def escape(c):
> +  >     o = ord(c)
> +  >     if o < 0x80:
> +  >         return c
> +  >     else:
> +  >         return r'\x%02x' % o # escape char setting MSB
> +  > for l in sys.stdin:
> +  >     sys.stdout.write(''.join(escape(c) for c in l))
> +  > EOF
> +
> +  $ hg commit -i --encoding cp932 2>&1 <<EOF | python $TESTTMP/escape.py | grep '^y - '
>    > ?
>    > q
>    > EOF
> -  y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) (esc)
> +  y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes)
>
>    $ LANGUAGE=
>  #endif
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list