[PATCH] (resend+bugfix) Make patchbomb extension honor global encoding setting.

Sat Jul 14 13:27:15 CDT 2007

On Sat, Jul 14, 2007 at 10:57:26AM -0600, Wesley J. Landaker wrote:
> # HG changeset patch
> # User Wesley J. Landaker <wjl at icecavern.net>
> # Date 1184432222 21600
> # Node ID edc4c0731a68c25278253324385b6eed2e56fb3a
> # Parent  28b23b9073a8652f95b87975f6648912dfec5f71
> Make patchbomb extension honor global encoding setting.
> 
> Currently the patchbomb extension does not honor the global encoding
> setting, using neither the default, nor honoring the --encoding option.
> Instead, it always generates emails and patches using us-ascii. This
> obviously works 90% of the time, but isn't correct, and breaks things
> whenever people's names, logs of the changesets, etc are in the default
> encoding (e.g. usually UTF-8 by default).
> 
> This patch makes patchbomb use the global encoding used everywhere else
> in mercurial. Because things were broken for all encodings previously,
> and the default mercurial encoding is a superset of us-ascii, this does
> not have any backwards compatibility issues.
> 
> diff -r 28b23b9073a8 -r edc4c0731a68 hgext/patchbomb.py
> --- a/hgext/patchbomb.py	Fri Jul 13 08:28:57 2007 -0700
> +++ b/hgext/patchbomb.py	Sat Jul 14 10:57:02 2007 -0600
> @@ -65,7 +65,7 @@
>  # That should be all.  Now your patchbomb is on its way out.
>  
>  import os, errno, socket, tempfile
> -import email.MIMEMultipart, email.MIMEText, email.MIMEBase
> +import email.MIMEMultipart, email.MIMEText, email.MIMEBase, email.Charset
>  import email.Utils, email.Encoders
>  from mercurial import cmdutil, commands, hg, mail, ui, patch, util
>  from mercurial.i18n import _
> @@ -161,6 +161,11 @@ def patchbomb(ui, repo, *revs, **opts):
>          #        'Patch subject is complete summary.')
>          #body += '\n\n\n'
>  
> +        # always use raw encoding if we are previewing to the screen
> +        # to make it easier to manually review without a mail client
> +        if opts['test']:
> +            email.Charset.add_charset(util._encoding.lower(), None, None)
> +
>          if opts['plain']:
>              while patch and patch[0].startswith('# '): patch.pop(0)
>              if patch: patch.pop(0)
> @@ -169,8 +174,8 @@ def patchbomb(ui, repo, *revs, **opts):
>              body += cdiffstat('\n'.join(desc), patch) + '\n\n'
>          if opts['attach']:
>              msg = email.MIMEMultipart.MIMEMultipart()
> -            if body: msg.attach(email.MIMEText.MIMEText(body, 'plain'))
> -            p = email.MIMEText.MIMEText('\n'.join(patch), 'x-patch')
> +            if body: msg.attach(email.MIMEText.MIMEText(body, 'plain', util._encoding))
> +            p = email.MIMEText.MIMEText('\n'.join(patch), 'x-patch', util._encoding)

If I'm reading this correctly, this is attaching the patch marked as
being in the system encoding. Which is the wrong thing to do. 

The contents of the patch (leaving aside the metadata) are in an
unspecified and quite possibly nonexistent or inconsistent encoding. Our
goal is to transfer those byte-for-byte, not glyph-for-glyph. 

If we set an encoding, we're saying "character x corresponds to glyph
y" and we're also saying "please be sure to display/save glyph y". If
the client is using a different encoding, he's perfectly justified to
save glyph y as character z. So this is a step backward from no
encoding at all.

Unfortunately, the only solution that actually works is a) have
everyone use the same encoding (eg UTF-8) and b) have mail clients
pass things as 8-bit.

It may be reasonable for patches placed in message bodies to be marked as
in the local charset. And it may also be reasonable to always write
UTF-8 metadata in attachments. But both are a bit problematic..

-- 
Mathematics is the supreme nostalgia of our time.