D2769: hgweb: refactor the request draining code

indygreg (Gregory Szorc) phabricator at mercurial-scm.org
Sat Mar 10 01:23:30 UTC 2018


indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The previous code for draining was only invoked in a few places in
  the wire protocol. Behavior wasn't consist. Furthermore, it was
  difficult to reason about.
  
  With us converting the input stream to a capped reader, it is now
  safe to always drain the input stream when its size is known because
  we can never overrun the input and read into the next HTTP request.
  The only question is "should we?"
  
  This commit changes the draining code so every request is examined.
  Draining now kicks in for a few requests where it wouldn't before.
  But I think the code is sufficiently restricted so the behavior is
  safe. Possibly the most dangerous part of this code is the issuing
  of Connection: close for POST and PUT requests that don't have a
  Content-Length. I don't think there are any such uses in our WSGI
  application, so this should be safe.
  
  In the near future, I plan to significantly refactor the WSGI
  response handling. I anticipate this code evolving a bit. So any
  minor regressions around draining or connection closing behavior
  might be fixed as a result of that work.
  
  All tests pass with this change. That scares me a bit because it
  means we are lacking low-level tests for the HTTP protocol.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2769

AFFECTED FILES
  mercurial/hgweb/request.py
  mercurial/wireprotoserver.py

CHANGE DETAILS

diff --git a/mercurial/wireprotoserver.py b/mercurial/wireprotoserver.py
--- a/mercurial/wireprotoserver.py
+++ b/mercurial/wireprotoserver.py
@@ -301,9 +301,6 @@
         wsgireq.respond(HTTP_OK, HGTYPE, body=rsp)
         return []
     elif isinstance(rsp, wireprototypes.pusherr):
-        # This is the httplib workaround documented in _handlehttperror().
-        wsgireq.drain()
-
         rsp = '0\n%s\n' % rsp.res
         wsgireq.respond(HTTP_OK, HGTYPE, body=rsp)
         return []
@@ -316,21 +313,6 @@
 def _handlehttperror(e, wsgireq, req):
     """Called when an ErrorResponse is raised during HTTP request processing."""
 
-    # Clients using Python's httplib are stateful: the HTTP client
-    # won't process an HTTP response until all request data is
-    # sent to the server. The intent of this code is to ensure
-    # we always read HTTP request data from the client, thus
-    # ensuring httplib transitions to a state that allows it to read
-    # the HTTP response. In other words, it helps prevent deadlocks
-    # on clients using httplib.
-
-    if (req.method == 'POST' and
-        # But not if Expect: 100-continue is being used.
-        (req.headers.get('Expect', '').lower() != '100-continue')):
-        wsgireq.drain()
-    else:
-        wsgireq.headers.append((r'Connection', r'Close'))
-
     # TODO This response body assumes the failed command was
     # "unbundle." That assumption is not always valid.
     wsgireq.respond(e, HGTYPE, body='0\n%s\n' % pycompat.bytestr(e))
diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -254,12 +254,6 @@
         self.server_write = None
         self.headers = []
 
-    def drain(self):
-        '''need to read all data from request, httplib is half-duplex'''
-        length = int(self.env.get('CONTENT_LENGTH') or 0)
-        for s in util.filechunkiter(self.inp, limit=length):
-            pass
-
     def respond(self, status, type, filename=None, body=None):
         if not isinstance(type, str):
             type = pycompat.sysstr(type)
@@ -292,6 +286,49 @@
             elif isinstance(status, int):
                 status = statusmessage(status)
 
+            # Various HTTP clients (notably httplib) won't read the HTTP
+            # response until the HTTP request has been sent in full. If servers
+            # (us) send a response before the HTTP request has been fully sent,
+            # the connection may deadlock because neither end is reading.
+            #
+            # We work around this by "draining" the request data before
+            # sending any response in some conditions.
+            drain = False
+            close = False
+
+            if self.env[r'REQUEST_METHOD'] in (r'POST', r'PUT'):
+                # If we can't guarantee the end of stream, close the connection
+                # because any reading may be unsafe.
+                if not isinstance(self.inp, util.cappedreader):
+                    close = True
+                else:
+                    # We know the length of the input. So draining is possible.
+                    # Should we drain?
+
+                    # If the client sent an Expect: 100-continue, we assume
+                    # the client is intelligent and can handle no draining.
+                    if (self.env.get(r'HTTP_EXPECT', r'').lower()
+                        == r'100-continue'):
+                        pass
+                    else:
+                        # We /could/ only drain certain HTTP response codes.
+                        # But 200 and non-200 wire protocol responses both
+                        # require draining. Since we have a capped reader in
+                        # place for all situations where we drain, it is safe
+                        # to read from this stream. We'll either do a drain
+                        # or no-op if we're already at EOF.
+                        drain = True
+
+            if close:
+                self.headers.append((r'Connection', r'Close'))
+
+            if drain:
+                assert isinstance(self.inp, util.cappedreader)
+                while True:
+                    chunk = self.inp.read(32768)
+                    if not chunk:
+                        break
+
             self.server_write = self._start_response(
                 pycompat.sysstr(status), self.headers)
             self._start_response = None



To: indygreg, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list