[PATCH stable] url: fix UnicodeDecodeError on certificate verification error

Yuya Nishihara yuya at tcha.org
Wed Jan 5 09:44:50 CST 2011


Mads Kiilerich wrote:
> On 01/04/2011 05:58 PM, Yuya Nishihara wrote:
> > Yuya Nishihara wrote:
> >> # HG changeset patch
> >> # User Yuya Nishihara<yuya at tcha.org>
> >> # Date 1294159970 -32400
> >> # Branch stable
> >> # Node ID 54d034f4f5092649db64c4cb51af62322a2aca93
> >> # Parent  5d1bb1174047030036fcca00004573fc9f2c0713
> >> url: fix UnicodeDecodeError on certificate verification error
> >>
> >> SSLSockect.getpeercert() returns tuple containing unicodes for 'subject'.
> >
> > How to reproduce:
> >
> > % LANG=ja_JP.utf-8 hg clone https://github.com/mitsuhiko/flask-babel.git
> > ...
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 3: ordinal not in range(128)
> >
> > (needs hg-git extension and non-ascii locale)
> 
> Thanks for the patch.
> 
> It can also be reproduced with 1.7.3 without any extensions, assuming 
> web.cacerts is configured and
> 
>    $ LANG=ja_JP.utf-8 hg id https://google.com/
> 
> > BTW, commonName=*.github.com should match github.com?
> 
> No.

Hi,
My understanding was wrong. I just wondered why https://github.com/
works with the certificate for *.github.com.
I googled and found it's called as "subject alternative names":

http://www.google.com/search?q=ssl+subject+alternative+names

And the certificate data for github.com are:
{'subjectAltName': (('DNS', '*.github.com'), ('DNS', 'github.com')),
 'subject': ((('organizationName', u'*.github.com'),),
             (('organizationalUnitName', u'Domain Control Validated'),),
             (('commonName', u'*.github.com'),))}

Maybe we need to handle 'DNS' entry of 'subjectAltName' like a 'commonName'?

> >> diff --git a/mercurial/url.py b/mercurial/url.py
> >> --- a/mercurial/url.py
> >> +++ b/mercurial/url.py
> >> @@ -10,7 +10,7 @@
> >>   import urllib, urllib2, urlparse, httplib, os, re, socket, cStringIO
> >>   import __builtin__
> >>   from i18n import _
> >> -import keepalive, util
> >> +import keepalive, util, encoding
> >>
> >>   def _urlunparse(scheme, netloc, path, params, query, fragment, url):
> >>       '''Handle cases where urlunparse(urlparse(x://)) doesn't preserve the "//"'''
> >> @@ -498,7 +498,7 @@ def _verifycert(cert, hostname):
> >>       for s in cert.get('subject', []):
> >>           key, value = s[0]
> >>           if key == 'commonName':
> >> -            certname = value.lower()
> >> +            certname = value.lower().encode(encoding.encoding, 'replace')
> 
> I'm not convinced that we should use encoding.encoding and 'replace' 
> here. Encoding to a 8-bit encoding would in principle make the matching 
> a bit ambiguous and could lead to security issues.
> 
> Mercurial doesn't support IDN at all, so shouldn't we just fail if the 
> certificate name can't be converted to ascii?
> 
> It might be even better to convert dnsname to unicode before comparing, 
> and only convert to local encoding in the error message.
> 
> What do you think?

Hmm, I agree that my patch has a security issue.
I think your first suggestion is fairly good and much simple.

It can convert dnsname to unicode or certname to idna, but it just won't work
due to lack of IDN support elsewhere, as you mentioned.
And 99.999% of dnsnames are in ascii, I bet.

> >>               if (certname == dnsname or
> >>                   '.' in dnsname and certname == '*.' + dnsname.split('.', 1)[1]):
> >>                   return None

Yuya,


More information about the Mercurial-devel mailing list