[Bug 5687] New: Long unicode lines are encoded incorrectly into email

mercurial-bugs at mercurial-scm.org mercurial-bugs at mercurial-scm.org
Thu Sep 21 09:04:31 UTC 2017


https://bz.mercurial-scm.org/show_bug.cgi?id=5687

            Bug ID: 5687
           Summary: Long unicode lines are encoded incorrectly into email
           Product: Mercurial
           Version: stable branch
          Hardware: PC
                OS: All
            Status: UNCONFIRMED
          Severity: bug
          Priority: wish
         Component: Mercurial
          Assignee: bugzilla at mercurial-scm.org
          Reporter: iippolitov at gmail.com
                CC: mercurial-devel at mercurial-scm.org

Created attachment 1975
  --> https://bz.mercurial-scm.org/attachment.cgi?id=1975&action=edit
mail.py patch

If you commit a file with long unicode lines, most probably you will receive a
base64 encoded blob in a notification email instead of a patch.
The message header will state that "Content-Transfer-Encoding" is
"quoted-printable", while the body will be base64 encoded.
And if you decode that blob you will find your quoted-printable-encoded patch.

The problem is in mercurial/mail.py module:

>  def mimetextqp(body, subtype, charset):
>      '''Return MIME message.
>      Quoted-printable transfer encoding will be used if necessary.
>      '''
>      enc = None
>      for line in body.splitlines():
>          if len(line) > 950:
>              body = quopri.encodestring(body)
>              enc = "quoted-printable"
>              break
>  
>      msg = email.MIMEText.MIMEText(body, subtype, charset)
>      if enc:
>          del msg['Content-Transfer-Encoding']
>          msg['Content-Transfer-Encoding'] = enc
>      return msg

email.MIMEText.MIMEText will use default encoder which sometimes uses base64
instread of quoted-printable to encode text (e.g. when 'charset' equals
'utf8'). And in that case the mail module produce a message which is
quoted-printable encoded, than base64 encoded and then the b64 header is
silently dropped.

Double encoding can be tracked down if you change test-notify.t and add a
single UTF-8 symbol to a long line test (I used Russian "Ñ„").

I'm not sure if you you really have to encode emails on your own today (the
original change is from 2009 and many things have changed since then).
Anyway, please find the patch for this issue attached.

Feel free to ask additional questions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list