[PATCH] mail: encode long unicode lines in emails properly (issue5687)

Fri Sep 22 09:29:51 EDT 2017

My strong believe is that the software should fully rely on pythong 'email'
module to handle encondings and eveything. Just reuse the code (possibly
fixing it).
So from my point of view, the whole function should consist of just two
latter lines (at least, no manual quopri calls).

> msg.set_payload()
> msg.set_charset()

But in this case I would prefer doing this in a 'robust' way, because I
know nothing about the initial issue.
Nor do I know anything about internal 'email' machinery.

If this is acceptable, I can rewrite this patch to fully rely on pythons
email module and compose a test to make sure the initial issue about long
lines is solved.

2017-09-22 15:58 GMT+03:00 Yuya Nishihara <yuya at tcha.org>:

> On Thu, 21 Sep 2017 17:59:52 +0300, Ippolitov Igor wrote:
> > # HG changeset patch
> > # User Igor Ippolitov <iippolitov at gmail.com>
> > # Date 1505982498 0
> > #      Thu Sep 21 08:28:18 2017 +0000
> > # Node ID 474d0bfc9809032668864cea16026f9b53d24a6d
> > # Parent  05131c963767faaac6a66b2c658659bfbb4db29b
> > mail: encode long unicode lines in emails properly (issue5687)
> >
> > 3e544c074459 introduced a bug: emails Content-Transfer-Encoding
> > is silently replaced with 'quoted-printable' while any other
> > encoding could be used by underlying code. The problem is revealed
> > when a long unicode line is encoded.
> >
> > The patch implements proper check which works for any text and
> > encoding.
> > test-notify.t changed so that it would fail on unpatched code
> > test-patchbomb.t changed due to email headers order change
> >
> > The patch won't work for python 3.6 as it lacks email.set_charset()
> >
> > diff -r 05131c963767 -r 474d0bfc9809 mercurial/mail.py
> > --- a/mercurial/mail.py       Wed Sep 20 09:35:45 2017 -0700
> > +++ b/mercurial/mail.py       Thu Sep 21 08:28:18 2017 +0000
> > @@ -217,16 +217,22 @@
> >      Quoted-printable transfer encoding will be used if necessary.
> >      '''
> >      enc = None
> > +    cs = email.charset.Charset(charset)
> > +    msg = email.MIMEText.MIMEText('', subtype)
> > +    del msg['Content-Transfer-Encoding']
> >      for line in body.splitlines():
> >          if len(line) > 950:
> >              body = quopri.encodestring(body)
> > -            enc = "quoted-printable"
> > +            cs.body_encoding = email.charset.QP
>
> Perhaps body_encoding should be preserved if it's set to BASE64. IIUC,
> BASE64
> is preferred for utf-8 content. So the easiest hack would be to just set
> charset='iso-8859-1' if it was 'us-ascii' and long line detected.
>
> > +            enc = True
> >              break
> > +    if enc:
> > +        msg.set_charset(cs)
> > +        msg.set_payload(body)
> > +    else:
> > +        msg.set_payload(body)
> > +        msg.set_charset(cs)
>
> This if/else seems unnecessary if body is not encoded beforehand.
>
> > -    msg = email.MIMEText.MIMEText(body, subtype, charset)
> > -    if enc:
> > -        del msg['Content-Transfer-Encoding']
> > -        msg['Content-Transfer-Encoding'] = enc
> >      return msg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170922/c61349e6/attachment-0001.html>