[PATCH] patch: decode e-mail headers

Rafaël Carré funman at videolan.org
Tue Oct 22 19:25:40 CDT 2013


Le 22/10/2013 20:32, Augie Fackler a écrit :
> On Tue, Oct 22, 2013 at 02:18:00PM +0200, funman at videolan.org wrote:
>> # HG changeset patch
>> # User Rafaël Carré <funman at videolan.org>
>> # Date 1382444275 -7200
>> #      Tue Oct 22 14:17:55 2013 +0200
>> # Branch stable
>> # Node ID e8c0f97e42ca9e09b4000245bd713f03e5d72038
>> # Parent  2c886dedd9021598b6290d95ea0f068731ea4e2b
>> patch: decode e-mail headers
>>
>>     Change commits from:
>> user:        =?UTF-8?q?Rafa=C3=ABl=20Carr=C3=A9?= <funman at videolan.org>
>>     to:
>> user:        Rafaël Carré <funman at videolan.org>
>>
>> diff -r 2c886dedd902 -r e8c0f97e42ca mercurial/patch.py
>> --- a/mercurial/patch.py	Mon Oct 21 10:50:58 2013 -0700
>> +++ b/mercurial/patch.py	Tue Oct 22 14:17:55 2013 +0200
>> @@ -12,6 +12,7 @@
>>  # load. This was not a problem on Python 2.7.
>>  import email.Generator
>>  import email.Parser
>> +from email.header import decode_header
>>
>>  from i18n import _
>>  from node import hex, short
>> @@ -162,6 +163,25 @@
>>      Any item in the returned tuple can be None. If filename is None,
>>      fileobj did not contain a patch. Caller must unlink filename when done.'''
>>
>> +    def header_decode(h):
>> +        '''Decode ?=UTF-8? from e-mail headers.'''
>> +        if h is None:
>> +            return None
>> +        res = ''
>> +        pairs = decode_header(h)
>> +        if pairs is None:
>> +            return None
>> +        n = len(pairs)
>> +        pair = 0
>> +        for p in pairs:
>> +            pair += 1
>> +            if p[1] == 'utf-8' or p[1] is None:
>> +                res += p[0]
>> +                if pair < n:
>> +                    res += ' '
>> +
>> +        return res
>> +
>>      # attempt to detect the start of a patch
>>      # (this heuristic is borrowed from quilt)
>>      diffre = re.compile(r'^(?:Index:[ \t]|diff[ \t]|RCS file: |'
>> @@ -174,8 +194,8 @@
>>      try:
>>          msg = email.Parser.Parser().parse(fileobj)
>>
>> -        subject = msg['Subject']
>> -        user = msg['From']
>> +        subject = header_decode(msg['Subject'])
>> +        user = header_decode(msg['From'])
>>          if not subject and not user:
>>              # Not an email, restore parsed headers if any
>>              subject = '\n'.join(': '.join(h) for h in msg.items()) + '\n'
> 
> Can I get you to add a simple test to one of the existing 'hg import'
> test cases so we don't break this in the future?

Hi, here's the test:

  $ cat > utf8.patch <<EOF
  > From: =?UTF-8?q?=C3=AB?=
  > Subject: patch
  > diff --git /dev/null b/a
  > --- /dev/null
  > +++ b/a
  > @@ -0,0 +1,1 @@
  > +a
  > EOF
  $ hg init utf
  $ cd utf
  $ hg import ../utf8.patch
  $ hg log | grep ^user -
  user:        ë

It currently fails with:

--- /media/dev/hg/tests/test-import.t
+++ /media/dev/hg/tests/test-import.t.err
@@ -1169,5 +1169,10 @@
   $ hg init utf
   $ cd utf
   $ hg import ../utf8.patch
+  applying ../utf8.patch
+  transaction abort!
+  rollback completed
+  abort: decoding near '\xc3\xab': 'ascii' codec can't decode byte 0xc3
in position 0: ordinal not in range(128)! (esc)
+  [255]
   $ hg log | grep ^user -
-  user:        ë
+  [1]


Something goes bad in hg import with LANG=C and I'm not sure why.

Why is the backtrace hidden here?


More information about the Mercurial-devel mailing list