I have $http_proxy set to be able to go through our firewall, and it works for regular http: repositories. If these repositories refer to additional http: (sub)repositories, the environment variable http_proxy apparently is not used, as hg hangs while pulling from the subrepo (traceback from ctrl- c when it hangs gives: File "/usr/lib64/python2.4/httplib.py", line 626, in connect self.sock.connect(sa) ) However, if I state the proxy configuration explicitly to hg, like $ hg --config http_proxy.host=www.ourproxy.com:80 clone http://public.ict.sintef.no/opm/hg/dune-cornerpoint then it can pull the subrepository without any problems. Using hg version 1.6.3, Redhat Enterprise Linux 5.4, Python 2.4.
The only place Mercurial reads http_proxy.host it falls back to using the http_proxy variable, so I can't see how what you describe should be possible. Can you verify an extra time that the report is 100% correct? Does it work (without proxy) if you use the subrepos on a network where http_proxy isn't needed? Can you give a full test case? (Preferably as a Mercurial test, for example combining tests/test-http-proxy.t and tests/test-subrepo-relative-path.t - http://mercurial.selenic.com/wiki/WritingTests )
I can still reproduce it. I guess writing a test would be difficult as it requires a remote http-server and a proxy in between (and a firewall blocking regular port 80 traffic). Going outside the firewalled network, I don't see these problems. $ unset http_proxy $ hg clone http://public.ict.sintef.no/opm/hg/dune-cornerpoint #<ctrl-c> interrupted! # This is expected, port 80 traffic is blocked by firewall when proxy is not used. # Let's try again, now specifying http_proxy, it *should* work, but doesn't. $ export http_proxy=http://www-proxy.statoil.no:80/ $ hg clone http://public.ict.sintef.no/opm/hg/dune-cornerpoint destination directory: dune-cornerpoint requesting all changes adding changesets adding manifests adding file changes added 832 changesets with 1865 changes to 287 files updating to branch default pulling subrepo grid/preprocess from http://public.ict.sintef.no/opm/hg/cpgpreprocess <ctrl-c> interrupted! # Last attempt, specifying proxy config directly on hg command line, this works for some reason. $ rm -rf dune-cornerpoint/ $ hg --config http_proxy.host=www-proxy.statoil.no:80 clone http://public.ict.sintef.no/opm/hg/dune-cornerpoint destination directory: dune-cornerpoint requesting all changes adding changesets adding manifests adding file changes added 832 changesets with 1865 changes to 287 files updating to branch default pulling subrepo grid/preprocess from http://public.ict.sintef.no/opm/hg/cpgpreprocess requesting all changes adding changesets adding manifests adding file changes added 73 changesets with 167 changes to 28 files 116 files updated, 0 files merged, 0 files removed, 0 files unresolved
Right, got it. The cleanup from http://selenic.com/repo/hg//rev/9294dce4b633 is usually done _after_ the proxyhandler has read the environment variables, but with subrepos it comes back again later. I'm not sure how we should fix this. Hacking os.environ is ugly and fragile. Perhaps we should read all relevant variables initially and store them instead of reading on-the-fly. Storing in config would be an obvious choice, but they are perhaps not sufficiently global. (For testing we can use tinyproxy.py as test-http-proxy.t does and either hack the proxy to rewrite something so we know it is used, or just check the logs.)
Attached is one first stab at writing a test-case, that tries to verify that the proxy is actually used for all access.
Attached is an updated test-subrepo-http-proxy.t Given the following patch to url.py, it will pass, but not assuming this is a good solution. I tried to guess the point about del os.environ was to be able to override http_proxy from environment if it was supplied using -- config, but this did not seem to work anyway before I deleted del os.environ, so there is more to it. --- a/mercurial/url.py Sat Oct 30 21:02:27 2010 +0200 +++ b/mercurial/url.py Sat Oct 30 21:11:02 2010 +0200 @@ -222,15 +222,6 @@ else: proxies = {} - # urllib2 takes proxy values from the environment and those - # will take precedence if found, so drop them - for env in ["HTTP_PROXY", "http_proxy", "no_proxy"]: - try: - if env in os.environ: - del os.environ[env] - except OSError: - pass - urllib2.ProxyHandler.__init__(self, proxies) self.ui = ui
FWIW, the change looks good to me. urllib2 only uses the environment variables if the ProxyHandler is initialised with None. url.py's proxyhandler always passes a dict of proxies (which may be empty), thus bypassing the environment variable check. The code deleting the environment variables was added a long time ago (revision 9294dce4b633), before the empty-dict mechanism was introduced (revision b3afa6725082), and is no longer relevant.
Fixed by http://selenic.com/repo/hg/rev/02734d2baa79 Renato Cunha <renatoc@gmail.com> url: Remove the proxy env variables only when needed (issue2451) (please test the fix)
--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:13 EDT --- This bug was previously known as _bug_ 2451 at http://mercurial.selenic.com/bts/issue2451 Imported an attachment (id=1475) Imported an attachment (id=1476)
Fixed by https://mercurial-scm.org/repo/hg/rev/69459fdf3b1b Pulkit Goyal <7895pulkit@gmail.com> url: remove unnecessary deletion of environ variables while dealing with proxy Currently we delete proxy environment variables if ui.config contains proxy values. This is unnecessary because urllib2.ProxyHandler class only reads proxy from environment it is initialised by None. But url.py never passes None, so there is no point urllib2 will take environment variables in account. This also prevents deleting environment variables which is not safe. This code was introduced while resolving Bug 2451 even it is in one of comments (sixth one) on bug that we can safely remove this part. Link to bug : https://bz.mercurial-scm.org/show_bug.cgi?id=2451 (please test the fix)