[PATCH] patchbomb: Encode overly long lines
Rocco Rutte
pdmef at gmx.net
Fri May 8 07:12:00 CDT 2009
Hi,
* Martin Geisler wrote:
> Rocco Rutte <pdmef at gmx.net> writes:
> We had a discussion about this just the other day:
>
> http://markmail.org/message/pmo5dlro2vqpuvcp
Yes, that's my motivation when trying to fix it.
> There the symptom was that lines were broken after 990 chars, so maybe
> we should break a bit earlier to be on the safe side? The RFC says that
> lines SHOULD be no more than 78 chars.
Yes. But I don't think it makes sense to go much lower then 990. The more often
patchbomb qp-encodes a mail, the more often people will have to decode it first.
A ~1000 byte long line is a rather rare an edge-case I'd say.
> > diff --git a/mercurial/mail.py b/mercurial/mail.py
> > --- a/mercurial/mail.py
> > +++ b/mercurial/mail.py
> > @@ -6,7 +6,7 @@
> > # GNU General Public License version 2, incorporated herein by reference.
> >
> > from i18n import _
> > -import os, smtplib, socket
> > +import os, smtplib, socket, quopri
>
> There is a email.Encoders.encode_quopri function which does the same and
> sets the Content-Transfer-Encoding header at the same time.
I tried it and decided to roll my own. The problem is that it leaves a
Content-Transfer-Encoding header of "7bit" for me in addition to the qp
one (so I'd have to remove one anyways). Second, it also qp-encodes
spaces and tabs which is quite some bloat and renders the text
completely unreadable for humans.
> > import email.Header, email.MIMEText, email.Utils
> > import util, encoding
> >
> > @@ -88,14 +88,37 @@ def validateconfig(ui):
> >
> > def mimetextpatch(s, subtype='plain', display=False):
> > '''If patch in utf-8 transfer-encode it.'''
> > +
> > + def encode_qp(str):
> > + for line in str.split('\n'):
> > + if len(line) > 998:
> > + return quopri.encodestring(str), "quoted-printable"
> > + return str, None
> > +
> > + passed = False
> > if not display:
> > for cs in ('us-ascii', 'utf-8'):
> > try:
> > s.decode(cs)
> > - return email.MIMEText.MIMEText(s, subtype, cs)
> > + s, enc = encode_qp(s)
> > + passed = True
> > + msg = email.MIMEText.MIMEText(s, subtype, cs)
> > + if enc is not None:
> > + del msg['Content-Transfer-Encoding']
> > + msg['Content-Transfer-Encoding'] = enc
> > + return msg
> > except UnicodeDecodeError:
> > pass
> > - return email.MIMEText.MIMEText(s, subtype)
> > +
> > + if passed:
> > + return email.MIMEText.MIMEText(s, subtype)
> > +
> > + s, enc = encode_qp(s)
> > + msg = email.MIMEText.MIMEText(s, subtype)
> > + if enc is not None:
> > + del msg['Content-Transfer-Encoding']
> > + msg['Content-Transfer-Encoding'] = enc
> > + return msg
>
> It's getting late here, but is the above code not sort of repeated? :-)
Yes, because the former creates a message with charset and the second
does not and I don't want to hardcode 'us-ascii' for the display
case. Or do you mean to move that code to a separate function that uses
if-else to create a message?
> Also, if passed is set to True, then msg will always have be returned
> From the loop -- or can MIMEText also throw a UnicodeDecodeError?
Ah, passed=True should come first in case decode() throws an error.
Rocco
More information about the Mercurial-devel
mailing list