[PATCH] mail: encode long unicode lines in emails properly (issue5687)

Yuya Nishihara yuya at tcha.org
Fri Sep 22 08:58:08 EDT 2017


On Thu, 21 Sep 2017 17:59:52 +0300, Ippolitov Igor wrote:
> # HG changeset patch
> # User Igor Ippolitov <iippolitov at gmail.com>
> # Date 1505982498 0
> #      Thu Sep 21 08:28:18 2017 +0000
> # Node ID 474d0bfc9809032668864cea16026f9b53d24a6d
> # Parent  05131c963767faaac6a66b2c658659bfbb4db29b
> mail: encode long unicode lines in emails properly (issue5687)
> 
> 3e544c074459 introduced a bug: emails Content-Transfer-Encoding
> is silently replaced with 'quoted-printable' while any other
> encoding could be used by underlying code. The problem is revealed
> when a long unicode line is encoded.
> 
> The patch implements proper check which works for any text and
> encoding.
> test-notify.t changed so that it would fail on unpatched code
> test-patchbomb.t changed due to email headers order change
> 
> The patch won't work for python 3.6 as it lacks email.set_charset()
> 
> diff -r 05131c963767 -r 474d0bfc9809 mercurial/mail.py
> --- a/mercurial/mail.py	Wed Sep 20 09:35:45 2017 -0700
> +++ b/mercurial/mail.py	Thu Sep 21 08:28:18 2017 +0000
> @@ -217,16 +217,22 @@
>      Quoted-printable transfer encoding will be used if necessary.
>      '''
>      enc = None
> +    cs = email.charset.Charset(charset)
> +    msg = email.MIMEText.MIMEText('', subtype)
> +    del msg['Content-Transfer-Encoding']
>      for line in body.splitlines():
>          if len(line) > 950:
>              body = quopri.encodestring(body)
> -            enc = "quoted-printable"
> +            cs.body_encoding = email.charset.QP

Perhaps body_encoding should be preserved if it's set to BASE64. IIUC, BASE64
is preferred for utf-8 content. So the easiest hack would be to just set
charset='iso-8859-1' if it was 'us-ascii' and long line detected.

> +            enc = True
>              break
> +    if enc:
> +        msg.set_charset(cs)
> +        msg.set_payload(body)
> +    else:
> +        msg.set_payload(body)
> +        msg.set_charset(cs)

This if/else seems unnecessary if body is not encoded beforehand.

> -    msg = email.MIMEText.MIMEText(body, subtype, charset)
> -    if enc:
> -        del msg['Content-Transfer-Encoding']
> -        msg['Content-Transfer-Encoding'] = enc
>      return msg


More information about the Mercurial-devel mailing list