[PATCH] patch: decode e-mail headers
Rafaël Carré
funman at videolan.org
Tue Oct 22 19:25:40 CDT 2013
Le 22/10/2013 20:32, Augie Fackler a écrit :
> On Tue, Oct 22, 2013 at 02:18:00PM +0200, funman at videolan.org wrote:
>> # HG changeset patch
>> # User Rafaël Carré <funman at videolan.org>
>> # Date 1382444275 -7200
>> # Tue Oct 22 14:17:55 2013 +0200
>> # Branch stable
>> # Node ID e8c0f97e42ca9e09b4000245bd713f03e5d72038
>> # Parent 2c886dedd9021598b6290d95ea0f068731ea4e2b
>> patch: decode e-mail headers
>>
>> Change commits from:
>> user: =?UTF-8?q?Rafa=C3=ABl=20Carr=C3=A9?= <funman at videolan.org>
>> to:
>> user: Rafaël Carré <funman at videolan.org>
>>
>> diff -r 2c886dedd902 -r e8c0f97e42ca mercurial/patch.py
>> --- a/mercurial/patch.py Mon Oct 21 10:50:58 2013 -0700
>> +++ b/mercurial/patch.py Tue Oct 22 14:17:55 2013 +0200
>> @@ -12,6 +12,7 @@
>> # load. This was not a problem on Python 2.7.
>> import email.Generator
>> import email.Parser
>> +from email.header import decode_header
>>
>> from i18n import _
>> from node import hex, short
>> @@ -162,6 +163,25 @@
>> Any item in the returned tuple can be None. If filename is None,
>> fileobj did not contain a patch. Caller must unlink filename when done.'''
>>
>> + def header_decode(h):
>> + '''Decode ?=UTF-8? from e-mail headers.'''
>> + if h is None:
>> + return None
>> + res = ''
>> + pairs = decode_header(h)
>> + if pairs is None:
>> + return None
>> + n = len(pairs)
>> + pair = 0
>> + for p in pairs:
>> + pair += 1
>> + if p[1] == 'utf-8' or p[1] is None:
>> + res += p[0]
>> + if pair < n:
>> + res += ' '
>> +
>> + return res
>> +
>> # attempt to detect the start of a patch
>> # (this heuristic is borrowed from quilt)
>> diffre = re.compile(r'^(?:Index:[ \t]|diff[ \t]|RCS file: |'
>> @@ -174,8 +194,8 @@
>> try:
>> msg = email.Parser.Parser().parse(fileobj)
>>
>> - subject = msg['Subject']
>> - user = msg['From']
>> + subject = header_decode(msg['Subject'])
>> + user = header_decode(msg['From'])
>> if not subject and not user:
>> # Not an email, restore parsed headers if any
>> subject = '\n'.join(': '.join(h) for h in msg.items()) + '\n'
>
> Can I get you to add a simple test to one of the existing 'hg import'
> test cases so we don't break this in the future?
Hi, here's the test:
$ cat > utf8.patch <<EOF
> From: =?UTF-8?q?=C3=AB?=
> Subject: patch
> diff --git /dev/null b/a
> --- /dev/null
> +++ b/a
> @@ -0,0 +1,1 @@
> +a
> EOF
$ hg init utf
$ cd utf
$ hg import ../utf8.patch
$ hg log | grep ^user -
user: ë
It currently fails with:
--- /media/dev/hg/tests/test-import.t
+++ /media/dev/hg/tests/test-import.t.err
@@ -1169,5 +1169,10 @@
$ hg init utf
$ cd utf
$ hg import ../utf8.patch
+ applying ../utf8.patch
+ transaction abort!
+ rollback completed
+ abort: decoding near '\xc3\xab': 'ascii' codec can't decode byte 0xc3
in position 0: ordinal not in range(128)! (esc)
+ [255]
$ hg log | grep ^user -
- user: ë
+ [1]
Something goes bad in hg import with LANG=C and I'm not sure why.
Why is the backtrace hidden here?
More information about the Mercurial-devel
mailing list