[PATCH v3] check-code: skip unhandled files

Martijn Pieters mj at zopatista.com
Wed May 18 16:17:35 EDT 2016


On 18 May 2016 at 20:48, timeless <timeless at gmail.com> wrote:
> timeless wrote:
>> +            with opentext(f) as fp:
>> +                header = fp.readline()
>
> This doesn't actually fix the problem I'm hitting w/ py3:
>
> +  Traceback (most recent call last):
> +    File "/home/timeless/hg/crew/tests/../contrib/check-code.py",
> line 632, in <module>
> +      header = fp.readline()
> +    File "/home/timeless/hg/py3/lib/python3.5/encodings/ascii.py",
> line 26, in decode
> +      return codecs.ascii_decode(input, self.errors)[0]
> +  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position 949: ordinal not in range(128)
>
> The file in question is CONTRIBUTORS.
> And really, this shouldn't be hard to get right, but, boy does python3
> make it hard.
> There's no reason that I can think of for readline() to go anywhere
> near position 949.

.readline() works with a buffer, and that buffer is greater than 949
bytes. Without a buffer looking for the next newline separator gets
really inefficient real fast.

Open the file in binary mode instead; .readline() still works and you
can match against bytes. Python doesn't support multi-byte encodings
like UTF-16 and UTF-32 for source code anyway.

> fwiw, I've tried using io.open, and it works fine in standalone
> testing but breaks in run-tests.

In Python 3, open() is io.open(). The default value for encoding is
taken from the locale, which I think is set to C in run-tests (so the
default encoding for opening files is then ASCII).

-- 
Martijn Pieters


More information about the Mercurial-devel mailing list