[PATCH] patchbomb: Encode overly long lines

Rocco Rutte pdmef at gmx.net
Fri May 8 12:04:53 CDT 2009


Hi,

* Martin Geisler wrote:
> Rocco Rutte <pdmef at gmx.net> writes:

> > Yes. But I don't think it makes sense to go much lower then 990. The
> > more often patchbomb qp-encodes a mail, the more often people will
> > have to decode it first. A ~1000 byte long line is a rather rare an
> > edge-case I'd say.

> Definitely. So maybe we should encode if we get lines longer than,
> say, 950 characters?

Yeah, sure.

> >> There is a email.Encoders.encode_quopri function which does the
> >> same and sets the Content-Transfer-Encoding header at the same
> >> time.

> > I tried it and decided to roll my own. The problem is that it leaves
> > a Content-Transfer-Encoding header of "7bit" for me in addition to
> > the qp one (so I'd have to remove one anyways). Second, it also
> > qp-encodes spaces and tabs which is quite some bloat and renders the
> > text completely unreadable for humans.

> Aha, I had no idea about that... Great that you tried it :-)

I'll think I'll need to file some bug reports against python. There're 
more problems than only that unfortunately.

> So it should be fine to pass 'us-ascii' as a charset. With that in
> mind, I think this should be equivalent to your patch:

Yes it is and it works and I find it slightly more readable.

> I tried analyzing the cases and came to the conclussion that the
> charset is set to 'utf-8' if and only if we're not displaying the
> patch, we're not able to decode with 'us-ascii' and we are able to
> decode with 'utf-8'. Like this, except that it doesn't work due to the
> exceptions:

>     if not display and not s.decode('us-ascii') and s.decode('utf-8'):
>         cs = 'utf-8'
>     else:
>         cs = 'us-ascii'

> Do you think that looks okay?

Yes. I'll see if I can produce some tests to verify the various cases 
with non-ascii and broken utf-8 plus combinations of these.

Rocco


More information about the Mercurial-devel mailing list