[PATCH] (resend+bugfix) Make patchbomb extension honor global encoding setting.

Wesley J. Landaker wjl at icecavern.net
Sat Jul 14 14:08:13 CDT 2007


On Saturday 14 July 2007 12:27:15 Matt Mackall wrote:
> If I'm reading this correctly, this is attaching the patch marked as
> being in the system encoding. Which is the wrong thing to do.
[...]
> The contents of the patch (leaving aside the metadata) are in an
> unspecified and quite possibly nonexistent or inconsistent encoding. Our
> goal is to transfer those byte-for-byte, not glyph-for-glyph.

The whole situation is not perfect, because there is no way to know what 
encoding the *patch* is in. But this change makes it works like the rest of 
mercurial, and assumes things are in the same encoding as what is set 
globally, or what is passed in with --encoding.

This isn't really any different than what mercurial currently does if you 
do, say, hg export, or hg export --encoding 'whatever'.

Currently, the behavior to send with 'us-ascii' is *always* wrong, because 
it doesn't ever match the internally used encoding, except by accident 
(because UTF-8 is a superset of us-ascii).

Just like it does currently, the patch is still sent byte-for-byte, even if 
it doesn't *actually* match the encoding. It's not re-encoded; this just 
basically just changes the Content-Type: header to say that the patch is in 
whatever encoding is given.

> Unfortunately, the only solution that actually works is a) have
> everyone use the same encoding (eg UTF-8) and b) have mail clients
> pass things as 8-bit.
[...]
> It may be reasonable for patches placed in message bodies to be marked as
> in the local charset. And it may also be reasonable to always write
> UTF-8 metadata in attachments. But both are a bit problematic..

Okay, well, let's decide how to fix this, because it's currently wrong. This 
patch was created because right now, it generates bad emails for *any* 
encoding, unless it's strictly us-ascii.

Right now, the behavior is:
  * Raw bytes stuffed into email.
  * Content type marked as 'us-ascii'
  * Patch always mangled if it's not really us-ascii

With this patch, behavior is:
  * Raw bytes stuffed into email.
  * Content type marked as default encoding, or --encoding encoding.
  * Patch always correct if it matches the given encoding.

So at least with this patch it works perfectly if everyone is using UTF-8 
(or any encoding, as long as everyone is the same). Without this patch, it 
only works if everyone is using us-ascii. In all other cases, currently, 
patches, logs, names, are all totally mangled.

The current patch is IMO much much better than it was before, since it works 
with a default UTF-8 flow, which is very critical for people who need 
something besides us-ascii in their log messages, names, files, etc.

Do you (anyone) have some suggestions on how to do this better?

I really am happy to implement it, because I *need* this functionality on a 
daily basis. I use UTF-8 encoding exclusively; I personally use lots of 
technical and math symbols. So all my e-mailed patches are broken as soon 
as they have a non-ascii character.

-- 
Wesley J. Landaker <wjl at icecavern.net> <xmpp:wjl at icecavern.net>
OpenPGP FP: 4135 2A3B 4726 ACC5 9094  0097 F0A9 8A4C 4CD6 E3D2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20070714/3a382a81/attachment-0001.pgp 


More information about the Mercurial-devel mailing list