[PATCH] mail: encode long unicode lines in emails properly (issue5687)

Igor Ippolitov iippolitov at gmail.com
Mon Sep 25 09:50:33 EDT 2017


There is a nice small research from Django which states that QP is
preferred for UTF8 texts: https://code.djangoproject.com/ticket/22561
Base64 is preferred for binary data.

Concerning the "if enc: ... else:" part, I thought you don't have to encode
long lines on your own. But this is still an issue.
So this part should be as it is.
'set_charset' call sets correct headers for a letter (not only
'Content-Transfer-Encoding')
This call will change payload if no CTE header is set. So for preencoded
payload it is called before payload is set. If it is called after - the
payload will be double encoded (like it is now for long unicode lines).

So I don't see any reasons to change the patch for now.


2017-09-25 16:38 GMT+03:00 Yuya Nishihara <yuya at tcha.org>:

> On Mon, 25 Sep 2017 10:06:44 +0300, Igor Ippolitov wrote:
> > I dug a little more and finally realized, that the patch will work nicely
> > in python 3.6.
> > The compat3.2 we discussed in IRC is not a module, but API which is in
> the
> > class
> > This api includes set_payload and set_charset methods. So nothing will be
> > broken
> >
> > Additionally, I tested that initial issue with long lies is not solved,
> so
> > you have to
> > perform some additional checks encoding the body (so there is no 'simple'
> > solution)
> >
> > In the end, I belive, the patch submitted is the best solution for now:
> > It solves all issues and does minimal changes.
>
> Okay, thanks for investigating that.
>
> > >> >          if len(line) > 950:
> > >> >              body = quopri.encodestring(body)
> > >> > -            enc = "quoted-printable"
> > >> > +            cs.body_encoding = email.charset.QP
> > >>
> > >> Perhaps body_encoding should be preserved if it's set to BASE64. IIUC,
> > >> BASE64
> > >> is preferred for utf-8 content.
>
> Any thought on this? Do you think QP is preferred for UTF-8 over the
> default
> BASE64 encoding?
>
> > >> > +            enc = True
> > >> >              break
> > >> > +    if enc:
> > >> > +        msg.set_charset(cs)
> > >> > +        msg.set_payload(body)
> > >> > +    else:
> > >> > +        msg.set_payload(body)
> > >> > +        msg.set_charset(cs)
> > >>
> > >> This if/else seems unnecessary if body is not encoded beforehand.
>
> Can you fix this? IIUC, we can just call msg.set_payload(body, charset) if
> body isn't encoded in QP manually.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170925/ef585fc7/attachment.html>


More information about the Mercurial-devel mailing list