[PATCH] mail: encode long unicode lines in emails properly (issue5687)
Yuya Nishihara
yuya at tcha.org
Fri Sep 22 08:58:08 EDT 2017
On Thu, 21 Sep 2017 17:59:52 +0300, Ippolitov Igor wrote:
> # HG changeset patch
> # User Igor Ippolitov <iippolitov at gmail.com>
> # Date 1505982498 0
> # Thu Sep 21 08:28:18 2017 +0000
> # Node ID 474d0bfc9809032668864cea16026f9b53d24a6d
> # Parent 05131c963767faaac6a66b2c658659bfbb4db29b
> mail: encode long unicode lines in emails properly (issue5687)
>
> 3e544c074459 introduced a bug: emails Content-Transfer-Encoding
> is silently replaced with 'quoted-printable' while any other
> encoding could be used by underlying code. The problem is revealed
> when a long unicode line is encoded.
>
> The patch implements proper check which works for any text and
> encoding.
> test-notify.t changed so that it would fail on unpatched code
> test-patchbomb.t changed due to email headers order change
>
> The patch won't work for python 3.6 as it lacks email.set_charset()
>
> diff -r 05131c963767 -r 474d0bfc9809 mercurial/mail.py
> --- a/mercurial/mail.py Wed Sep 20 09:35:45 2017 -0700
> +++ b/mercurial/mail.py Thu Sep 21 08:28:18 2017 +0000
> @@ -217,16 +217,22 @@
> Quoted-printable transfer encoding will be used if necessary.
> '''
> enc = None
> + cs = email.charset.Charset(charset)
> + msg = email.MIMEText.MIMEText('', subtype)
> + del msg['Content-Transfer-Encoding']
> for line in body.splitlines():
> if len(line) > 950:
> body = quopri.encodestring(body)
> - enc = "quoted-printable"
> + cs.body_encoding = email.charset.QP
Perhaps body_encoding should be preserved if it's set to BASE64. IIUC, BASE64
is preferred for utf-8 content. So the easiest hack would be to just set
charset='iso-8859-1' if it was 'us-ascii' and long line detected.
> + enc = True
> break
> + if enc:
> + msg.set_charset(cs)
> + msg.set_payload(body)
> + else:
> + msg.set_payload(body)
> + msg.set_charset(cs)
This if/else seems unnecessary if body is not encoded beforehand.
> - msg = email.MIMEText.MIMEText(body, subtype, charset)
> - if enc:
> - del msg['Content-Transfer-Encoding']
> - msg['Content-Transfer-Encoding'] = enc
> return msg
More information about the Mercurial-devel
mailing list