Bug 3905 - Push hangs on big chunk (Windows 2008R2 + IIS)
Summary: Push hangs on big chunk (Windows 2008R2 + IIS)
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: hgweb (show other bugs)
Version: 2.6.1
Hardware: PC Windows
: urgent bug
Assignee: Augie Fackler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-23 16:43 UTC by Jesus Vidal
Modified: 2014-02-04 14:30 UTC (History)
13 users (show)

See Also:
Python Version: ---


Attachments
Comparing "about" windows for TortoiseHg 2.4.1 and 2.4.2 (47.04 KB, image/png)
2013-05-09 23:10 UTC, James O'Cull
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jesus Vidal 2013-04-23 16:43 UTC
When a user try to push a big chunk size, it hangs at ~30000kb aproximately. I've configured IIS correctly to allow big files as mentioned here http://stackoverflow.com/questions/3068627/mercurial-client-error-255-and-http-error-404-when-attempting-to-push-large-file/3079399#3079399 but the behaviour it's the same.

I've tried using a timeout of 120 seconds, and then, when it hangs, after that timeout, it push more bytes (approximately the next 30000kb).

This happens only using https, when configuring hg repository to allow push without SSL, the push works fine, but i need to use SSL.

There is no problem with IIS size, it seems a problem with hgwebdir or perhaps Python?
Comment 1 Matt Mackall 2013-04-23 18:07 UTC
What version of Mercurial, Windows, IIS? What is the working directory size of the largest file in your repository?

"There is no problem with IIS size, it seems a problem with hgwebdir or perhaps Python?"

My money is on IIS being to blame. But you can prove me wrong by making 'hg serve' do the same thing.
Comment 2 Jesus Vidal 2013-04-24 03:20 UTC
Mercurial 2.5.4 + Windows Server 2008 R2 and IIS 7.5. 

The commit changes a group of files with different sizes (the larger 6MB, but the total size of the group is 65MB), when it's bigger than approximately 30000kb, it hangs, but as i said, it only happens using https, if i configure hg to allow push without ssl (http), works fine.

With hg serve i only can test http, and http works fine on IIS, so how can i test https?
Comment 3 Matt Mackall 2013-04-24 12:56 UTC
Do you have any corresponding errors reported in your IIS error log (not the same as the access log)?
Comment 4 Jesus Vidal 2013-04-24 17:25 UTC
I think that you want a log from %SystemDrive%\inetpub\logs\LogFiles

This is what i found:
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2013-04-24 21:12:20
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
2013-04-24 21:12:20 192.168.1.146 GET / - 80 - 192.168.1.1 - 200 0 64 26996
2013-04-24 21:13:11 192.168.1.146 GET /hg cmd=capabilities 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 133
2013-04-24 21:13:11 192.168.1.146 GET /hg cmd=batch 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 295
2013-04-24 21:13:19 192.168.1.146 GET /hg cmd=getbundle 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 8536
2013-04-24 21:13:19 192.168.1.146 GET /hg cmd=listkeys 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 56
2013-04-24 21:13:19 192.168.1.146 GET /hg cmd=listkeys 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 9
2013-04-24 21:15:11 192.168.1.146 GET /hg cmd=capabilities 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 11
2013-04-24 21:15:11 192.168.1.146 GET /hg cmd=batch 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 9
2013-04-24 21:15:11 192.168.1.146 GET /hg cmd=branchmap 443 - 192.168.1.2 mercurial/proto-1.0 200 0 0 90
2013-04-24 21:17:29 192.168.1.146 POST /hg cmd=unbundleHttpExtensionProc+function+failed! 443 - 192.168.1.2 mercurial/proto-1.0 004 0 64 135507
Comment 5 Jesus Vidal 2013-04-24 17:29 UTC
Forgot to say, that i've changed the real IPs from the log.
Comment 6 James O'Cull 2013-05-09 23:06 UTC
Believe it or not... it's actually **NOT** a Mercurial problem; it's a Python problem!

If you roll back to Python 2.6.6 or something it's fine. However 2.7.4 is completely busted when it comes to pushing via HTTPS.

It happens for me with these conditions:

- I am using TortoiseHg 2.4.2 or newer (linked against Hg 2.2.3 or newer)
-- If I use TortoiseHg 2.4.1 everything is fine. It's a client-side problem.
- I am trying to push via HTTPS to IIS using hgweb.cgi
-- HTTP works just fine; it only happens when using HTTPS
- I am pushing a payload larger than 30MB

I know it's not an IIS problem because I've had IIS through the ringer trying to find what was wrong with it (I was blaming it entirely).

See the forum thread here:
http://forums.iis.net/t/1197751.aspx/1?IIS+7+5+CGI+SSL+Problems+with+upload+limit+30MB+cap+
Comment 7 James O'Cull 2013-05-09 23:10 UTC
Created attachment 1726 [details]
Comparing "about" windows for TortoiseHg 2.4.1 and 2.4.2

Here are the "about" windows for the breaking versions
Comment 8 James O'Cull 2013-05-09 23:21 UTC
My testing shows that this problem was introduced in Python 2.7.3+

If you use 2.7.2 or earlier it is fine.
Comment 9 Jesus Vidal 2013-05-10 03:23 UTC
Yes! You found the problem! I've tried Tortoise 2.4.1 (compiled with python 2.6.6) and works fine! So it's a python problem!

Thank you so much!
Comment 10 James O'Cull 2013-05-10 09:25 UTC
There is a problem in the SSL module in 2.7.3/4 - something about it doesn't like IIS's SSL implementation is our guess.

You can workaround this problem in TortoiseHg by copying _ssl.pyd from a Python 2.7.2 install into your TortoiseHg directory. I think it matters that the x86 or x64 versions match, so be careful which you copy over.
Comment 11 Steve Borho 2013-05-10 12:14 UTC
Has a bug been opened for this in Python?  I would much rather upgrade my build machine forward to 2.7.5 than downgrade back to 2.7.2
Comment 12 James O'Cull 2013-05-10 12:38 UTC
It was dismissed a "security upgrade". http://bugs.python.org/issue13885

You can however update your server to disable SSL V2 as detailed here: http://stackoverflow.com/a/16486104/97964
Comment 13 Jesus Vidal 2013-05-10 13:51 UTC
(In reply to comment #11)
Comment 14 Jesus Vidal 2013-05-10 13:53 UTC
(In reply to comment #13)
(In reply to comment #11)
I've opened the bug http://bugs.python.org/issue17948

They said:
Hello Jesus, this report is far too vague to make anything about it. You should try to diagnose the issue further, here are some ideas:
- check whether it happens with another server than IIS
- try if you can reproduce without Mercurial being involved (simply write a script using httplib or urllib2 to push a file to the server)
- try to see what happens over the wire using e.g. Wireshark

Bonus points if you can find an easy way to reproduce, short of hosting a large Mercurial repo on a Windows server :-)


PS: Sorry for double post, i've replied by error.
Comment 15 James O'Cull 2013-05-10 14:13 UTC
Thanks! I went ahead and added more details to it, but I am not hopeful for any changes since it was originally part of a "fix".
Comment 16 Steve Borho 2013-05-10 22:42 UTC
I've read Antoine's response in the Python bug report, which can be summarized plainly as "I hope you can find a good workaround for IIS".

So while I'm willing to back-up or patch SSL on my build box, TortoiseHg users are not the only Mercurial users who use Python 2.7.4 and push to Windows hosts.  So it seems we will need a workaround in Mercurial of some fashion.

Do we detect IIS and force SSLv3?  Make a https.force_sslv3 config option?
Comment 17 James O'Cull 2013-05-11 01:39 UTC
Could it try SSL v3 first and fallback to v2 if that is no good?

On Friday, May 10, 2013, wrote:

> http://bz.selenic.com/show_bug.cgi?id=3905
>
> --- Comment #16 from Steve Borho <steve@borho.org <javascript:;>> ---
> I've read Antoine's response in the Python bug report, which can be
> summarized
> plainly as "I hope you can find a good workaround for IIS".
>
> So while I'm willing to back-up or patch SSL on my build box, TortoiseHg
> users
> are not the only Mercurial users who use Python 2.7.4 and push to Windows
> hosts.  So it seems we will need a workaround in Mercurial of some fashion.
>
> Do we detect IIS and force SSLv3?  Make a https.force_sslv3 config option?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 18 Augie Fackler 2013-05-11 20:47 UTC
Having squinted at SSL docs some, I think I'd actually like to make hg default to SSLv3 only, and have a flag to regain the older (less secure behavior). That probably violates our compat rules though. Matt?
Comment 19 Augie Fackler 2013-05-15 15:11 UTC
I'll draft a patch to disable SSLv2 by default, which should mitigate this.
Comment 20 Augie Fackler 2013-05-15 15:17 UTC
Er, no. That's already been done in modern Pythons. I'll likely do that for us anyway, but the problem is an attack mitigation that happens to break IIS's SSL support. I think at this point, it's really a bug in IIS that requires IIS to be insecure.

Really not sure what we should do about this.
Comment 21 Augie Fackler 2013-05-15 15:19 UTC
Steve, I'm a little unclear: would forcing SSLv3 fix IIS? I thought we were already using SSLv3.
Comment 22 Matt Mackall 2013-05-15 15:44 UTC
We are puzzled.

By reports, disabling SSLv2 in Python 2.7.3 by hacking hg fixes the problem:

 http://stackoverflow.com/a/16486104/97964

But also, SSLv2 is supposed to ALREADY be disabled in Python 2.7.3:

 http://hg.python.org/cpython/rev/f9122975fd80

Are you perchance using:

 [ui]
 usehttp2 = True
Comment 23 Antoine Pitrou 2013-05-15 16:41 UTC
Yes, I don't know what happens exactly (using Wireshark may help diagnosing). 
It's possible that disabling the SSLv2 ciphers doesn't disable the SSLv2 Hello, or perhaps there's something else I'm missing.

It may be reasonable to default on TLSv1 actually, it's quite old already (RFC 2246 says January 1999), and let users downgrade using a configuration option.
Comment 24 Matt Mackall 2013-05-15 18:27 UTC
Can someone test this fix from Augie against Python 2.7.4, please?

diff --git a/mercurial/httpconnection.py b/mercurial/httpconnection.py
--- a/mercurial/httpconnection.py
+++ b/mercurial/httpconnection.py
@@ -279,6 +279,13 @@
             kwargs['keyfile'] = keyfile
         if certfile:
             kwargs['certfile'] = certfile
+        try:
+            import ssl
+            kwargs['ssl_version'] = ssl.PROTOCOL_SSLv3
+        except ImportError:
+            # Python < 2.6 won't have an ssl module, so we can't force SSLv3.
+            pass
+
 
         kwargs.update(sslutil.sslkwargs(self.ui, host))
Comment 25 Zach Mason 2013-05-16 12:08 UTC
Hi,

Matt, I have confirmed the patch above fixes the problem if usehttp2 = true is set, but it still hangs up with the default settings.

I had never seen that option before, but saw that the patch only applies to that section of the code. Curiously, when usehttp2 is on without the patch, it hangs up at ~15MB, halfway to the point where the default stall occurs. I assume this is because we are using basic auth and it is retransmitting the data twice with the default settings.

To fix it when usehttp2 is off, a similar snippit of code should be inserted in sslutil.py inside ssl_wrap_socket, or modify the args to ssl_wrap_socket so it can be done from url.py.

As far as the root cause of the problem, I can confirm it isn't SSLv2 exactly that causes it to stall, but it occurs under any protocol when any cipher that uses CBC is used. I can disable all CBC-based ciphers using IISCrypto to manipulate the IIS registry settings, leaving SSLv2 enabled, and everything works. I'm not an SSL expert, so I am not familiar with how the cipher set is negotiated. The relevent docs from http://www.openssl.org/docs/ssl/SSL_CTX_set_options.html explain the issue some:

SSL_OP_DONT_INSERT_EMPTY_FRAGMENTS
Disables a countermeasure against a SSL 3.0/TLS 1.0 protocol vulnerability affecting CBC ciphers, which cannot be handled by some broken SSL implementations. This option has no effect for connections using other ciphers.

This is the setting that was switched on (countermeasure enabled) in python 2.7.3. I would think that providing a different cipher list to ssl.wrap_socket could also work around the problem, but I don't know enough about formatting the cipher string to get it to work.

Changing the default protocol from PROTOCOL_SSLv23 to PROTOCOL_SSLv3 fixes the issue from the client side, presumably because it forces a cipher that doesn't trigger IIS's buggy behavior. Trying PROTOCAL_TLSv1 does NOT fix the problem, I assume because it defaults to a block cipher.

A more accurate "fix" from the client side is probably to specify a stream cipher. I am not an expert at crafting cipher lists, but I can confirm that inserting RC4 at the head of the list also fixes the problem:

sslsocket = ssl.wrap_socket(sock, keyfile, certfile,
                cert_reqs=cert_reqs, ca_certs=ca_certs, 
                ciphers='RC4:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2',
                )

I hope this helps. Let me know if you need more details or need me to run any more tests.
Comment 26 Matt Mackall 2013-07-19 13:42 UTC
Bumping up to urgent.
Comment 27 Augie Fackler 2013-07-24 17:09 UTC
Mailed some patches just now.
Comment 28 HG Bot 2013-07-25 03:45 UTC
Fixed by http://selenic.com/repo/hg/rev/42fcb2f7787d
Augie Fackler <raf@durin42.com>
httpclient: update to revision 9517a2b56fe9 of httpplus (issue3905)

Includes upstream change "socketutil: force SSLv3 by default, as it is
safer" which should fix issue 3905.

(please test the fix)
Comment 29 HG Bot 2013-07-25 03:45 UTC
Fixed by http://selenic.com/repo/hg/rev/074bd02352c0
Augie Fackler <raf@durin42.com>
sslutil: force SSLv3 on Python 2.6 and later (issue3905)

We can't (easily) force SSL version on older Pythons, but on 2.6 and
later we can force SSLv3, which is safer and widely supported. This
also appears to work around a bug in IIS detailed in issue 3905.

(please test the fix)
Comment 30 Matt Mackall 2013-08-01 23:57 UTC
This is now part of 2.7, please test if you can.
Comment 31 Thijs Alkemade 2013-09-17 14:48 UTC
This change broke connectivity to servers that actually care about security.

SSLv3 is not recommended anymore:

"SSL v3 is very old and obsolete. Because it lacks some key features and because virtually all clients support TLS 1.0 and better, you should not support SSL v3 unless you have a very good reason." (https://www.ssllabs.com/projects/best-practices/)

Forcing Mercurial to *only* use SSLv3 is stupid. If you insist on supporting SSLv3, then you should at least still *allow* TLS 1.0, 1.1 and 1.2. If there are servers like in the OP that break with TLS 1.0+, then there should be a flag to force Mercurial to use a specific version of TLS.
Comment 32 Matt Mackall 2013-09-17 16:06 UTC
The intent of the patches is to disable SSLv2 and force _at least_ SSLv3 (many servers still don't support TLS), which should be clear from reading this report and the patches. Prior to this, Python defaulted to using SSLv23_method() in OpenSSL.

Looking at the OpenSSL source (ssl/s23_meth.c), it appears that with appropriate compile flags, SSLv23_method will provide v2, v3, or TLS, while the SSLv3 method does not provide a corresponding "or higher" semantic. Unfortunately, Python provides very narrow access to OpenSSL, and it may be difficult to actually achieve the desired effect.

New bug opened as bug #4039, this bug left in state TESTING.
Comment 33 Matt Mackall 2013-10-19 16:29 UTC
Assuming fixed.
Comment 34 Michael Hallock 2014-02-04 14:30 UTC
I seem to still have this issue with Mercurial 2.8.2 and Python 2.7.6. Large pushes cut off right around the 30 Mb mark still.

Using TortoiseHg 2.10.2 (Base mercurial 2.8.2), both from GUI and command line.