[PATCH] patch: decode e-mail headers

Wed Oct 23 08:23:55 CDT 2013

Le 23/10/2013 09:01, Martin Geisler a écrit :
> funman at videolan.org writes:
> 
>> # HG changeset patch
>> # User Rafaël Carré <funman at videolan.org>
>> # Date 1382444275 -7200
>> #      Tue Oct 22 14:17:55 2013 +0200
>> # Branch stable
>> # Node ID e8c0f97e42ca9e09b4000245bd713f03e5d72038
>> # Parent  2c886dedd9021598b6290d95ea0f068731ea4e2b
>> patch: decode e-mail headers
>>
>>     Change commits from:
>> user:        =?UTF-8?q?Rafa=C3=ABl=20Carr=C3=A9?= <funman at videolan.org>
>>     to:
>> user:        Rafaël Carré <funman at videolan.org>
>>
>> diff -r 2c886dedd902 -r e8c0f97e42ca mercurial/patch.py
>> --- a/mercurial/patch.py	Mon Oct 21 10:50:58 2013 -0700
>> +++ b/mercurial/patch.py	Tue Oct 22 14:17:55 2013 +0200
>> @@ -12,6 +12,7 @@
>>  # load. This was not a problem on Python 2.7.
>>  import email.Generator
>>  import email.Parser
>> +from email.header import decode_header
> 
> We normally only import modules and reference functions with
> 
>   module.function
> 
> This is because we have a demand-import system in place that can delay
> loading the module until it is really needed. This makes Mercurial start
> faster.

Alright.

>>  from i18n import _
>>  from node import hex, short

That file has counter examples though :)

>> @@ -162,6 +163,25 @@
>>      Any item in the returned tuple can be None. If filename is None,
>>      fileobj did not contain a patch. Caller must unlink filename when done.'''
>>  
>> +    def header_decode(h):
>> +        '''Decode ?=UTF-8? from e-mail headers.'''
>> +        if h is None:
>> +            return None
>> +        res = ''
>> +        pairs = decode_header(h)
> 
> You define a header_decode function and there is apparently already a
> decode_header function in scope -- that looks confusing to me :-)
> 
> The names are so similar that I wouldn't know what the difference is
> between the two functions. Maybe you can come up with a better name for
> the new function.

Yup, will try to come up with something better indeed.

>> +        if pairs is None:
>> +            return None
>> +        n = len(pairs)
>> +        pair = 0
>> +        for p in pairs:
>> +            pair += 1
>> +            if p[1] == 'utf-8' or p[1] is None:
> 
> Does this mean that you only handle UTF-8 encoded headers? What about
> ISO-8859-* encoded headers?

I can look at other encodings but first I need to solve the problem
mentioned in
http://www.selenic.com/pipermail/mercurial-devel/2013-October/054402.html

That is user and subject are expected to use local encoding and not
utf-8, which causes character loss when local encoding is ASCII (e.g. in
the testsuite), and perhaps in other encodings too.

It seems a bit hard problem for a newcomer to mercurial like me so any
advice is very appreciated here.