[PATCH] mail: take --encoding and HGENCODING into account

Sat Oct 8 04:59:00 EDT 2016

On Fri, 07 Oct 2016 09:56:10 -0500, Gábor Stefanik wrote:
> # HG changeset patch
> # User Gábor Stefanik <gabor.stefanik at nng.com>
> # Date 1475667922 -7200
> #      Wed Oct 05 13:45:22 2016 +0200
> # Node ID 31350841be0c6af1c335fb02b28b8fd1f79089b9
> # Parent  91a3c58ecf938ed675f5364b88f0d663f12b0047
> mail: take --encoding and HGENCODING into account

New encoding strategy looks good. Can you update tests and resend?

Also, I found a couple of nits. Please see the inline comments.

> --- a/mercurial/mail.py
> +++ b/mercurial/mail.py
> @@ -205,22 +205,40 @@
>  
>  def mimetextpatch(s, subtype='plain', display=False):
>      '''Return MIME message suitable for a patch.
> -    Charset will be detected as utf-8 or (possibly fake) us-ascii.
> +    Charset will be detected by first trying to decode as us-ascii, then utf-8,
> +    and finally the global encodings. If all those fail, fall back to
> +    ISO-8859-1, an encoding with that allows all byte sequences.
>      Transfer encodings will be used if necessary.'''
>  
> -    cs = 'us-ascii'
> +    def codec2iana(encoding):
> +        encoding = email.charset.Charset(encoding).input_charset.lower()
> +        
> +        if encoding.startswith("iso") and not encoding.startswith("iso-"):
> +            return "iso-" + encoding[3:]
> +        return encoding

- encoding.charset is a module. we need "import encoding.charset" in case
  it isn't imported yet.
- we generally define this kind of functions in module scope, which has to
  capture no local variables.
- better to not shadow the global "encoding" module.
- can you add a comment why we have to fix 'iso' aliases?

> +    cs = "iso-8859-1" # a "safe" encoding with no invalid byte sequences
>      if not display:

This change is mostly the source of the test failure. Maybe we can move it
to "not display" block.

>          try:
>              s.decode('us-ascii')
> +            cs = 'us-ascii'
>          except UnicodeDecodeError:
>              try:
>                  s.decode('utf-8')
>                  cs = 'utf-8'
>              except UnicodeDecodeError:
> -                # We'll go with us-ascii as a fallback.
> -                pass
> +                try:
> +                    s.decode(encoding.encoding)
> +                    cs = encoding.encoding
> +                except UnicodeDecodeError:
> +                    try: 
> +                        s.decode(encoding.fallbackencoding)
> +                        cs = encoding.fallbackencoding
> +                    except UnicodeDecodeError

SyntaxError

> +                        # fall back to ISO-8859-1
> +                        pass

It's time to rewrite them as a for loop?