[PATCH 0 of 6] Improve readability of non-ascii hg emails (issue814)

Christian Ebert blacktrash at gmx.net
Mon Jul 14 13:59:49 CDT 2008

* Matt Mackall on Monday, July 14, 2008 at 11:53:36 -0500
> On Mon, 2008-07-14 at 12:09 +0100, Christian Ebert wrote:
>>>> Patches must be kept independent of conventions between sender
>>>> and recipient. They are sent in ascii, utf-8, or as fake ascii
>>>> (current behaviour; see also TODO). utf-8 is safe to detect.
>>> Ok, so that means when we send an inline patch, we'll send the
>>> description in the same form as the patch, possibly promoting the patch?
>>> Alright, so that suggests this table:
>>> description    inline patch
>>>            ascii        utf-8     other
>>> ascii        ascii        utf-8     fake-ascii    
>>> utf-8        utf-8        utf-8     ??
>>> So if someone checks in a file with latin-1 (aka other) and latin-1
>>> description (converted to utf-8), what happens? Do we call the message
>>> ascii? Do we transcode our utf-8 back to ascii? Or do we put the utf-8
>>> description in the message body and still call it ascii? 
>>>> 2. Mail parts that do not contain patches
>>>> Introduce new [email] sendcharsets config (default:
>>>> util._encoding). us-ascii is always implied and tried first.
>>> Does this interact with the above?

No. Just to be clear.

>>> Really, inline patches is the
>>> interesting piece of this puzzle.
>> Indeed. You're going right for the interesting hairy stuff.
> That's because the rest of it is practically trivial. Everything but the
> patch itself is in a known encoding (utf-8) and it's a simple matter of
> programming to put it in an email. 
> When we look at inline patches, we have to answer some hard questions.
> And hopefully our answers to the easy questions haven't painted us into
> a corner. So let's look back at my table:
>>> description    inline patch
>>>             ascii        utf-8     other
>>> ascii        ascii        utf-8     fake-ascii    
>>> utf-8        utf-8        utf-8     ??
> What should happen in the corner? First let's note that there are a
> bunch of things that shouldn't happen. If we're sending a message and we
> pick latin1 as the encoding because the author's name had a ü in it, we
> shouldn't try to encode the patch as latin1, as it may in fact be koi8
> and the receiving user may in fact be using utf-8 and his mailer may
> helpfully save the patch in utf-8 at which point the content is now very
> wrong. Second, we probably can't do fake-utf-8 because mailers are quite
> likely to do the wrong thing or choke. Can we do better than fake-ascii?
> Probably not. Should we transcode the description text from utf-8?
> Maybe?
>> I might be wrong but eg. trying util._encoding for a an 8bit
>> patch that's not utf-8 is an optimistic assumption. The patch
>> might have been made with a different 8bit _encoding. Or the
>> "mixed" case above.
> Yep, we should definitely not assume anything about the contents of a
> patch. In fact, guessing utf-8 may also be trouble. Consider: I have a
> file called utf-8-example.txt. I send you a patch to add it to your
> repo. Your mailer is set to use latin1 by default. If hg mail marks the
> text as utf-8, your mailer may helpfully transcode it to latin1 and then
> we'll later discover that utf-8-example.txt is actually in latin1 on
> your machine. Fail.

I just tried with Mutt in an iso-8859-15 environment and a pure
utf-8 patch+description: saved as utf-8.

But of course you're right, mailers are bound to do all kinds of
"helpful" stuff.

> On the other hand, if we lie and claim it's binary,
> it'll be much harder to read.

There /is/ a clean and safe way, at the price of more bandwidth:

- separate changeset display in mailer and changest data
- changeset as text-part transcoded to cheapest
  [email]charset for perusal in mailer
- attach changeset data as bundle

These cases could be reduced to non-ascii changesets.

  Was heißt hier Dogma, ich bin Underdogma!
[ What the hell do you mean dogma, I am underdogma. ]

_F R E E_  _V I D E O S_  -->>  http://www.blacktrash.org/underdogma/

More information about the Mercurial-devel mailing list