[PATCH] patchbomb: Encode overly long lines
Rocco Rutte
pdmef at gmx.net
Fri May 8 12:04:53 CDT 2009
Hi,
* Martin Geisler wrote:
> Rocco Rutte <pdmef at gmx.net> writes:
> > Yes. But I don't think it makes sense to go much lower then 990. The
> > more often patchbomb qp-encodes a mail, the more often people will
> > have to decode it first. A ~1000 byte long line is a rather rare an
> > edge-case I'd say.
> Definitely. So maybe we should encode if we get lines longer than,
> say, 950 characters?
Yeah, sure.
> >> There is a email.Encoders.encode_quopri function which does the
> >> same and sets the Content-Transfer-Encoding header at the same
> >> time.
> > I tried it and decided to roll my own. The problem is that it leaves
> > a Content-Transfer-Encoding header of "7bit" for me in addition to
> > the qp one (so I'd have to remove one anyways). Second, it also
> > qp-encodes spaces and tabs which is quite some bloat and renders the
> > text completely unreadable for humans.
> Aha, I had no idea about that... Great that you tried it :-)
I'll think I'll need to file some bug reports against python. There're
more problems than only that unfortunately.
> So it should be fine to pass 'us-ascii' as a charset. With that in
> mind, I think this should be equivalent to your patch:
Yes it is and it works and I find it slightly more readable.
> I tried analyzing the cases and came to the conclussion that the
> charset is set to 'utf-8' if and only if we're not displaying the
> patch, we're not able to decode with 'us-ascii' and we are able to
> decode with 'utf-8'. Like this, except that it doesn't work due to the
> exceptions:
> if not display and not s.decode('us-ascii') and s.decode('utf-8'):
> cs = 'utf-8'
> else:
> cs = 'us-ascii'
> Do you think that looks okay?
Yes. I'll see if I can produce some tests to verify the various cases
with non-ascii and broken utf-8 plus combinations of these.
Rocco
More information about the Mercurial-devel
mailing list