[PATCH 02 of 10] localrepo: bytes for errors

Martijn Pieters mj at zopatista.com
Wed May 18 13:43:50 EDT 2016


On 15 May 2016 at 20:34, Gregory Szorc <gregory.szorc at gmail.com> wrote:
> Playing around with a custom codec, I'm not convinced this is easier than
> hacking up module import.
>
> When you use a custom "# coding" line, the source file's bytes get passed to
> the codec's decode(). To convert string literals to bytes would require us
> to identify string literals and rewrite the source bytes to contain the "b"
> prefix. At the point you're parsing string literals, you've just reinvented
> Python's parser. So it feels to me that the proper layer to inject the
> automagical rewriting would be in the parser or ast level and that would
> require custom module loading.
>
> Fortunately, Python 3.5 has all the module import bits implemented in Python
> (as opposed to C), so we /should/ have the control we need to inject
> ourselves into module loading at the right layer.

At least the codec route has the advantage that it only has to run
once per source file revision; the result is cached as bytecode.

You don't have to re-invent the parser; you could use the tokenize
module (https://docs.python.org/3/library/tokenize.html) would do all
the heavily lifting. Just look out for token.STRING tokens and process
the token string; it'll be the full string literal (including any
prefixes attached).

-- 
Martijn Pieters


More information about the Mercurial-devel mailing list