[PATCH] patch: decode e-mail headers
Rafaël Carré
funman at videolan.org
Wed Oct 23 08:23:55 CDT 2013
Le 23/10/2013 09:01, Martin Geisler a écrit :
> funman at videolan.org writes:
>
>> # HG changeset patch
>> # User Rafaël Carré <funman at videolan.org>
>> # Date 1382444275 -7200
>> # Tue Oct 22 14:17:55 2013 +0200
>> # Branch stable
>> # Node ID e8c0f97e42ca9e09b4000245bd713f03e5d72038
>> # Parent 2c886dedd9021598b6290d95ea0f068731ea4e2b
>> patch: decode e-mail headers
>>
>> Change commits from:
>> user: =?UTF-8?q?Rafa=C3=ABl=20Carr=C3=A9?= <funman at videolan.org>
>> to:
>> user: Rafaël Carré <funman at videolan.org>
>>
>> diff -r 2c886dedd902 -r e8c0f97e42ca mercurial/patch.py
>> --- a/mercurial/patch.py Mon Oct 21 10:50:58 2013 -0700
>> +++ b/mercurial/patch.py Tue Oct 22 14:17:55 2013 +0200
>> @@ -12,6 +12,7 @@
>> # load. This was not a problem on Python 2.7.
>> import email.Generator
>> import email.Parser
>> +from email.header import decode_header
>
> We normally only import modules and reference functions with
>
> module.function
>
> This is because we have a demand-import system in place that can delay
> loading the module until it is really needed. This makes Mercurial start
> faster.
Alright.
>> from i18n import _
>> from node import hex, short
That file has counter examples though :)
>> @@ -162,6 +163,25 @@
>> Any item in the returned tuple can be None. If filename is None,
>> fileobj did not contain a patch. Caller must unlink filename when done.'''
>>
>> + def header_decode(h):
>> + '''Decode ?=UTF-8? from e-mail headers.'''
>> + if h is None:
>> + return None
>> + res = ''
>> + pairs = decode_header(h)
>
> You define a header_decode function and there is apparently already a
> decode_header function in scope -- that looks confusing to me :-)
>
> The names are so similar that I wouldn't know what the difference is
> between the two functions. Maybe you can come up with a better name for
> the new function.
Yup, will try to come up with something better indeed.
>> + if pairs is None:
>> + return None
>> + n = len(pairs)
>> + pair = 0
>> + for p in pairs:
>> + pair += 1
>> + if p[1] == 'utf-8' or p[1] is None:
>
> Does this mean that you only handle UTF-8 encoded headers? What about
> ISO-8859-* encoded headers?
I can look at other encodings but first I need to solve the problem
mentioned in
http://www.selenic.com/pipermail/mercurial-devel/2013-October/054402.html
That is user and subject are expected to use local encoding and not
utf-8, which causes character loss when local encoding is ASCII (e.g. in
the testsuite), and perhaps in other encodings too.
It seems a bit hard problem for a newcomer to mercurial like me so any
advice is very appreciated here.
More information about the Mercurial-devel
mailing list