[PATCH] patchbomb: Encode overly long lines

Martin Geisler mg at lazybytes.net
Fri May 8 11:27:09 CDT 2009


Rocco Rutte <pdmef at gmx.net> writes:

> * Martin Geisler wrote:
>> Rocco Rutte <pdmef at gmx.net> writes:
>
>> We had a discussion about this just the other day:
>>
>>   http://markmail.org/message/pmo5dlro2vqpuvcp
>
> Yes, that's my motivation when trying to fix it.

Super!

>> There the symptom was that lines were broken after 990 chars, so
>> maybe we should break a bit earlier to be on the safe side? The RFC
>> says that lines SHOULD be no more than 78 chars.
>
> Yes. But I don't think it makes sense to go much lower then 990. The
> more often patchbomb qp-encodes a mail, the more often people will
> have to decode it first. A ~1000 byte long line is a rather rare an
> edge-case I'd say.

Definitely. So maybe we should encode if we get lines longer than,
say, 950 characters?

>> There is a email.Encoders.encode_quopri function which does the
>> same and sets the Content-Transfer-Encoding header at the same
>> time.
>
> I tried it and decided to roll my own. The problem is that it leaves
> a Content-Transfer-Encoding header of "7bit" for me in addition to
> the qp one (so I'd have to remove one anyways). Second, it also
> qp-encodes spaces and tabs which is quite some bloat and renders the
> text completely unreadable for humans.

Aha, I had no idea about that... Great that you tried it :-)

>> It's getting late here, but is the above code not sort of repeated? :-)
>
> Yes, because the former creates a message with charset and the
> second does not and I don't want to hardcode 'us-ascii' for the
> display case. Or do you mean to move that code to a separate
> function that uses if-else to create a message?

I looked at the MIMEText class and it's constructor has this signature
(in Python 2.4 at least):

    def __init__(self, _text, _subtype='plain', _charset='us-ascii'):

So it should be fine to pass 'us-ascii' as a charset. With that in
mind, I think this should be equivalent to your patch:

def mimetextpatch(s, subtype='plain', display=False):
    '''If patch in utf-8 transfer-encode it.'''

    enc = None
    for line in s.split('\n'):
        if len(line) > 950:
            s = quopri.encodestring(s)
            enc = "quoted-printable"
            break

    cs = 'us-ascii'
    if not display:
        try:
            s.decode('us-ascii')
        except UnicodeDecodeError:
            try:
                s.decode('utf-8'):
                cs = 'utf-8'
            except UnicodeDecodeError:
                # We'll go with us-ascii as a fallback.
                pass

    msg = email.MIMEText.MIMEText(s, subtype, cs)
    if enc:
        del msg['Content-Transfer-Encoding']
        msg['Content-Transfer-Encoding'] = enc
    return msg

I tried analyzing the cases and came to the conclussion that the
charset is set to 'utf-8' if and only if we're not displaying the
patch, we're not able to decode with 'us-ascii' and we are able to
decode with 'utf-8'. Like this, except that it doesn't work due to the
exceptions:

    if not display and not s.decode('us-ascii') and s.decode('utf-8'):
        cs = 'utf-8'
    else:
        cs = 'us-ascii'

Do you think that looks okay?

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.


More information about the Mercurial-devel mailing list