[PATCH] (resend+bugfix) Make patchbomb extension honor global encoding setting.
Wesley J. Landaker
wjl at icecavern.net
Sat Jul 14 14:08:13 CDT 2007
On Saturday 14 July 2007 12:27:15 Matt Mackall wrote:
> If I'm reading this correctly, this is attaching the patch marked as
> being in the system encoding. Which is the wrong thing to do.
[...]
> The contents of the patch (leaving aside the metadata) are in an
> unspecified and quite possibly nonexistent or inconsistent encoding. Our
> goal is to transfer those byte-for-byte, not glyph-for-glyph.
The whole situation is not perfect, because there is no way to know what
encoding the *patch* is in. But this change makes it works like the rest of
mercurial, and assumes things are in the same encoding as what is set
globally, or what is passed in with --encoding.
This isn't really any different than what mercurial currently does if you
do, say, hg export, or hg export --encoding 'whatever'.
Currently, the behavior to send with 'us-ascii' is *always* wrong, because
it doesn't ever match the internally used encoding, except by accident
(because UTF-8 is a superset of us-ascii).
Just like it does currently, the patch is still sent byte-for-byte, even if
it doesn't *actually* match the encoding. It's not re-encoded; this just
basically just changes the Content-Type: header to say that the patch is in
whatever encoding is given.
> Unfortunately, the only solution that actually works is a) have
> everyone use the same encoding (eg UTF-8) and b) have mail clients
> pass things as 8-bit.
[...]
> It may be reasonable for patches placed in message bodies to be marked as
> in the local charset. And it may also be reasonable to always write
> UTF-8 metadata in attachments. But both are a bit problematic..
Okay, well, let's decide how to fix this, because it's currently wrong. This
patch was created because right now, it generates bad emails for *any*
encoding, unless it's strictly us-ascii.
Right now, the behavior is:
* Raw bytes stuffed into email.
* Content type marked as 'us-ascii'
* Patch always mangled if it's not really us-ascii
With this patch, behavior is:
* Raw bytes stuffed into email.
* Content type marked as default encoding, or --encoding encoding.
* Patch always correct if it matches the given encoding.
So at least with this patch it works perfectly if everyone is using UTF-8
(or any encoding, as long as everyone is the same). Without this patch, it
only works if everyone is using us-ascii. In all other cases, currently,
patches, logs, names, are all totally mangled.
The current patch is IMO much much better than it was before, since it works
with a default UTF-8 flow, which is very critical for people who need
something besides us-ascii in their log messages, names, files, etc.
Do you (anyone) have some suggestions on how to do this better?
I really am happy to implement it, because I *need* this functionality on a
daily basis. I use UTF-8 encoding exclusively; I personally use lots of
technical and math symbols. So all my e-mailed patches are broken as soon
as they have a non-ascii character.
--
Wesley J. Landaker <wjl at icecavern.net> <xmpp:wjl at icecavern.net>
OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20070714/3a382a81/attachment-0001.pgp
More information about the Mercurial-devel
mailing list