[PATCH] patchbomb: Encode overly long lines

Rocco Rutte pdmef at gmx.net
Fri May 8 07:12:00 CDT 2009


Hi,

* Martin Geisler wrote:
> Rocco Rutte <pdmef at gmx.net> writes:

> We had a discussion about this just the other day:
>
>   http://markmail.org/message/pmo5dlro2vqpuvcp

Yes, that's my motivation when trying to fix it.

> There the symptom was that lines were broken after 990 chars, so maybe
> we should break a bit earlier to be on the safe side? The RFC says that
> lines SHOULD be no more than 78 chars.

Yes. But I don't think it makes sense to go much lower then 990. The more often
patchbomb qp-encodes a mail, the more often people will have to decode it first.
A ~1000 byte long line is a rather rare an edge-case I'd say.

> > diff --git a/mercurial/mail.py b/mercurial/mail.py
> > --- a/mercurial/mail.py
> > +++ b/mercurial/mail.py
> > @@ -6,7 +6,7 @@
> >  # GNU General Public License version 2, incorporated herein by reference.
> >  
> >  from i18n import _
> > -import os, smtplib, socket
> > +import os, smtplib, socket, quopri
> 
> There is a email.Encoders.encode_quopri function which does the same and
> sets the Content-Transfer-Encoding header at the same time.

I tried it and decided to roll my own. The problem is that it leaves a
Content-Transfer-Encoding header of "7bit" for me in addition to the qp
one (so I'd have to remove one anyways). Second, it also qp-encodes
spaces and tabs which is quite some bloat and renders the text
completely unreadable for humans.

> >  import email.Header, email.MIMEText, email.Utils
> >  import util, encoding
> >  
> > @@ -88,14 +88,37 @@ def validateconfig(ui):
> >  
> >  def mimetextpatch(s, subtype='plain', display=False):
> >      '''If patch in utf-8 transfer-encode it.'''
> > +
> > +    def encode_qp(str):
> > +        for line in str.split('\n'):
> > +            if len(line) > 998:
> > +                return quopri.encodestring(str), "quoted-printable"
> > +        return str, None
> > +
> > +    passed = False
> >      if not display:
> >          for cs in ('us-ascii', 'utf-8'):
> >              try:
> >                  s.decode(cs)
> > -                return email.MIMEText.MIMEText(s, subtype, cs)
> > +                s, enc = encode_qp(s)
> > +                passed = True
> > +                msg = email.MIMEText.MIMEText(s, subtype, cs)
> > +                if enc is not None:
> > +                    del msg['Content-Transfer-Encoding']
> > +                    msg['Content-Transfer-Encoding'] = enc
> > +                return msg
> >              except UnicodeDecodeError:
> >                  pass
> > -    return email.MIMEText.MIMEText(s, subtype)
> > +
> > +    if passed:
> > +        return email.MIMEText.MIMEText(s, subtype)
> > +
> > +    s, enc = encode_qp(s)
> > +    msg = email.MIMEText.MIMEText(s, subtype)
> > +    if enc is not None:
> > +        del msg['Content-Transfer-Encoding']
> > +        msg['Content-Transfer-Encoding'] = enc
> > +    return msg
> 
> It's getting late here, but is the above code not sort of repeated? :-)

Yes, because the former creates a message with charset and the second
does not and I don't want to hardcode 'us-ascii' for the display
case. Or do you mean to move that code to a separate function that uses
if-else to create a message?

> Also, if passed is set to True, then msg will always have be returned
> From the loop -- or can MIMEText also throw a UnicodeDecodeError?

Ah, passed=True should come first in case decode() throws an error.

Rocco


More information about the Mercurial-devel mailing list