Bug 2451 - http_proxy env variable not respected for subrepository cloning
Summary: http_proxy env variable not respected for subrepository cloning
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: normal bug
Assignee: Stefan Thalauer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-25 06:49 UTC by Håvard Berland
Modified: 2017-10-04 16:55 UTC (History)
9 users (show)

See Also:
Python Version: ---


Attachments
(34 bytes, application/x-troff)
2010-10-28 14:47 UTC, Håvard Berland
Details
(34 bytes, application/x-troff)
2010-10-30 13:45 UTC, Håvard Berland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Håvard Berland 2010-10-25 06:49 UTC
I have $http_proxy set to be able to go through our firewall, and it works 
for regular http: repositories. If these repositories refer to additional 
http: (sub)repositories, the environment variable http_proxy apparently is 
not used, as hg hangs while pulling from the subrepo (traceback from ctrl-
c when it hangs gives:
  File "/usr/lib64/python2.4/httplib.py", line 626, in connect
    self.sock.connect(sa)
)

However, if I state the proxy configuration explicitly to hg, like

$ hg --config http_proxy.host=www.ourproxy.com:80 clone 
http://public.ict.sintef.no/opm/hg/dune-cornerpoint

then it can pull the subrepository without any problems.

Using hg version 1.6.3, Redhat Enterprise Linux 5.4, Python 2.4.
Comment 1 kiilerix 2010-10-25 07:12 UTC
The only place Mercurial reads http_proxy.host it falls back to using the
http_proxy variable, so I can't see how what you describe should be possible.

Can you verify an extra time that the report is 100% correct?

Does it work (without proxy) if you use the subrepos on a network where
http_proxy isn't needed?

Can you give a full test case? (Preferably as a Mercurial test, for example
combining tests/test-http-proxy.t and tests/test-subrepo-relative-path.t -
http://mercurial.selenic.com/wiki/WritingTests )
Comment 2 Håvard Berland 2010-10-26 05:07 UTC
I can still reproduce it. I guess writing a test would be difficult as it
requires a remote http-server and a proxy in between (and a firewall
blocking regular port 80 traffic).

Going outside the firewalled network, I don't see these problems.

$ unset http_proxy
$ hg clone http://public.ict.sintef.no/opm/hg/dune-cornerpoint
#<ctrl-c>
interrupted!
# This is expected, port 80 traffic is blocked by firewall when proxy is not
used.

# Let's try again, now specifying http_proxy, it *should* work, but doesn't.
$ export http_proxy=http://www-proxy.statoil.no:80/
$ hg clone http://public.ict.sintef.no/opm/hg/dune-cornerpoint
destination directory: dune-cornerpoint
requesting all changes
adding changesets
adding manifests
adding file changes
added 832 changesets with 1865 changes to 287 files
updating to branch default
pulling subrepo grid/preprocess from
http://public.ict.sintef.no/opm/hg/cpgpreprocess
<ctrl-c>
interrupted!


# Last attempt, specifying proxy config directly on hg command line, this
works for some reason.

$ rm -rf dune-cornerpoint/
$ hg --config http_proxy.host=www-proxy.statoil.no:80 clone
http://public.ict.sintef.no/opm/hg/dune-cornerpoint
destination directory: dune-cornerpoint
requesting all changes
adding changesets
adding manifests
adding file changes
added 832 changesets with 1865 changes to 287 files
updating to branch default
pulling subrepo grid/preprocess from
http://public.ict.sintef.no/opm/hg/cpgpreprocess
requesting all changes
adding changesets
adding manifests
adding file changes
added 73 changesets with 167 changes to 28 files
116 files updated, 0 files merged, 0 files removed, 0 files unresolved
Comment 3 kiilerix 2010-10-26 06:15 UTC
Right, got it.

The cleanup from http://selenic.com/repo/hg//rev/9294dce4b633 is usually
done _after_ the proxyhandler has read the environment variables, but with
subrepos it comes back again later.

I'm not sure how we should fix this. Hacking os.environ is ugly and fragile.
Perhaps we should read all relevant variables initially and store them
instead of reading on-the-fly. Storing in config would be an obvious choice,
but they are perhaps not sufficiently global.

(For testing we can use tinyproxy.py as test-http-proxy.t does and either
hack the proxy to rewrite something so we know it is used, or just check the
logs.)
Comment 4 Håvard Berland 2010-10-28 14:47 UTC
Attached is one first stab at writing a test-case, that tries to verify that 
the proxy is actually used for all access.
Comment 5 Håvard Berland 2010-10-30 13:45 UTC
Attached is an updated test-subrepo-http-proxy.t

Given the following patch to url.py, it will pass, but not assuming this is 
a good solution. I tried to guess the point about del os.environ was to be 
able to override http_proxy from environment if it was supplied using --
config, but this did not seem to work anyway before I deleted del 
os.environ, so there is more to it.

--- a/mercurial/url.py	Sat Oct 30 21:02:27 2010 +0200
+++ b/mercurial/url.py	Sat Oct 30 21:11:02 2010 +0200
@@ -222,15 +222,6 @@
         else:
             proxies = {}
 
-        # urllib2 takes proxy values from the environment and those
-        # will take precedence if found, so drop them
-        for env in ["HTTP_PROXY", "http_proxy", "no_proxy"]:
-            try:
-                if env in os.environ:
-                    del os.environ[env]
-            except OSError:
-                pass
-
         urllib2.ProxyHandler.__init__(self, proxies)
         self.ui = ui
Comment 6 Simon King 2011-08-13 14:20 UTC
FWIW, the change looks good to me.

urllib2 only uses the environment variables if the ProxyHandler is
initialised with None. url.py's proxyhandler always passes a dict of proxies
(which may be empty), thus bypassing the environment variable check.

The code deleting the environment variables was added a long time ago
(revision 9294dce4b633), before the empty-dict mechanism was introduced
(revision b3afa6725082), and is no longer relevant.
Comment 7 HG Bot 2011-09-10 18:00 UTC
Fixed by http://selenic.com/repo/hg/rev/02734d2baa79
Renato Cunha <renatoc@gmail.com>
url: Remove the proxy env variables only when needed (issue2451)

(please test the fix)
Comment 8 Bugzilla 2012-05-12 09:13 UTC

--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:13 EDT  ---

This bug was previously known as _bug_ 2451 at http://mercurial.selenic.com/bts/issue2451
Imported an attachment (id=1475)
Imported an attachment (id=1476)
Comment 9 HG Bot 2017-10-04 16:55 UTC
Fixed by https://mercurial-scm.org/repo/hg/rev/69459fdf3b1b
Pulkit Goyal <7895pulkit@gmail.com>
url: remove unnecessary deletion of environ variables while dealing with proxy

Currently we delete proxy environment variables if ui.config contains proxy
values. This is unnecessary because urllib2.ProxyHandler class only reads proxy
from environment it is initialised by None. But url.py never passes None,
so there is no point urllib2 will take environment variables in account.

This also prevents deleting environment variables which is not safe.

This code was introduced while resolving Bug 2451 even it is in one of comments
(sixth one) on bug that we can safely remove this part.
Link to bug : https://bz.mercurial-scm.org/show_bug.cgi?id=2451

(please test the fix)