[PATCH 0 of 6] Improve readability of non-ascii hg emails (issue814)

Sun Jul 13 11:03:48 CDT 2008

On Sat, 2008-07-12 at 21:28 +0100, Christian Ebert wrote:
> * Matt Mackall on Friday, July 11, 2008 at 11:48:34 -0500
> > On Wed, 2008-07-09 at 17:33 +0100, Christian Ebert wrote:
> >> I resubmit a patch series to improve handling of non-ascii mail.
> > 
> > Hey folks: when you submit a patch that tries tackle a tricky technical
> > issue, you should describe exactly what your approach is and why it's
> > better than the alternatives up front. This patch series may very well
> > be the best possible answer to the problem,
> 
> I guarantee that it is not ;)
> 
> > and you may have already
> > described exactly what your approach is to me last week, but a) I've
> > completely forgotten already and b) it needs to be in the changelog
> > because I'll have forgotten it again next week. 
> 
> Hm, I tried hard in the description you snipped.

Ahh, but you wrote it aimed at someone who remembers what the problem
was in the first place. Let me jog my brain a bit. You wrote:

> 1. Patches
> 
> Patches must be kept independent of conventions between sender
> and recipient. They are sent in ascii, utf-8, or as fake ascii
> (current behaviour; see also TODO). utf-8 is safe to detect.

Ok, so that means when we send an inline patch, we'll send the
description in the same form as the patch, possibly promoting the patch?
Alright, so that suggests this table:

description    inline patch
             ascii        utf-8     other
ascii        ascii        utf-8     fake-ascii    
utf-8        utf-8        utf-8     ??

So if someone checks in a file with latin-1 (aka other) and latin-1
description (converted to utf-8), what happens? Do we call the message
ascii? Do we transcode our utf-8 back to ascii? Or do we put the utf-8
description in the message body and still call it ascii? 

> 2. Mail parts that do not contain patches
> 
> Introduce new [email] sendcharsets config (default:
> util._encoding). us-ascii is always implied and tried first.

Does this interact with the above? Really, inline patches is the
interesting piece of this puzzle.

> [email]
> # for westerners
> sendcharsets = iso-8859-1, iso-8859-15, windows-1252
> # other examples:
> # iso-8859-1, iso-8859-15, windows-1252, iso-8859-2, windows-1250
> # iso-8859-1, iso-8859-15, windows-1252, iso-8859-2, windows-1250,
> iso-2022-jp, iso-2022-jp-ms
> 
> (idea stolen from Mutt)
> 
> For headers and message parts that do not contain patches the
> convert function cycles through sendcharsets in descending order
> to try a successful conversion.
> 
> Both $HGENCODING and util._fallbackencoding are tried for input.
> 
> As last resort the conversion falls back to fake ascii (ie. the
> current behaviour).

Uh, what about utf-8?

> > Also, don't do the **opts thing. Passing arbitrary sets of options
> > around is best avoided. Yes, I know there are a bunch of places we do it
> > in hg, and those are all pretty unfortunate. Here you're checking a
> > grand total of one option (test), and doing the wrong thing: we should
> > actually output the encoding that will be sent! How else can you
> > actually "test" mail encoding?
> 
> I beg to defer. Something like the following, while correct, is
> not helpful to a user:

s/defer/differ/. I suppose you're right. Nonetheless, you should trade
your **opts arg for a test boolean (perhaps with a better name).

-- 
Mathematics is the supreme nostalgia of our time.