[PATCH 0 of 1] Decode UTF-8 e-mail headers present in git formatted patches

Matt Mackall mpm at selenic.com
Wed Nov 13 11:46:37 CST 2013


On Tue, 2013-11-12 at 11:13 +0100, funman at videolan.org wrote:
> This new patch adds a test which is currently failing.
> 
> http://mercurial.selenic.com/wiki/EncodingStrategy#The_encoding_tracking_problem
> says user names are 'owned and managed' by Mercurial and should be encoded as UTF-8.
> 
> This seems to not be the case, as HGENCODING=ascii (as used by the testsuite) will 
> render the user name 'ë' (0xc3 0xab) as '?'

This is correct: when you tell Mercurial that your terminal only does
ASCII, it can't show you an 'ë'.

User names are stored -on disk- as UTF-8, but manipulated and displayed
in the local character set, whatever that may be. 

> Is encoding.fromlocal being used? I couldn't figure it out.

Yes. It's here:

http://www.selenic.com/hg/file/c38c3fdc8b93/mercurial/changelog.py#l308

> Having email_decode return an UTF-8 string instead of using tolocal will make the test 
> fail with:

You must use tolocal. It is magic and can be made to work _even if the
local encoding is ASCII_.

http://www.selenic.com/hg/file/c38c3fdc8b93/mercurial/encoding.py#l61
http://mercurial.selenic.com/wiki/EncodingStrategy#Round-trip_conversion

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list