Bug 2188 - Operations Fail with 404 with lots of repos in hgweb.config and small RPS load
Summary: Operations Fail with 404 with lots of repos in hgweb.config and small RPS load
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: urgent bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-13 11:51 UTC by Matt Hawley
Modified: 2012-05-13 05:04 UTC (History)
6 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Hawley 2010-05-13 11:51 UTC
In a hosted environment that has lots of Hg repositories (CodePlex) we are 
running into issues where about 10-12 Requests Per Second to the server 
will result in 404 errors on simple operations (as well as cause pushes to 
fail with 500 errors). During our performance testing, we identified that 
setting up the hgweb.config file as

[paths]
/ = \\repo\path\*

Was very inneficient as the number of repositories grows the time taken 
for operations wish linearly getting worse (we were seeing about 90 
seconds to pull/push with 5000 repositories). Looking at the source during 
that time, indicated that using the wildcard scans the file system to 
generate a list of repositories. Our solution for this performance problem 
was to use the following format in the hgweb.config file

[paths]
/repo1 = \\repo\path\repo1
/repo2 = \\repo\path\repo2
...
/repoN = \\repo\path\repoN

However, now that our hgweb.config file has grown to a decent size (we 
have 1000+ repositories hosted), we start seeing 404 errors reported by 
hgweb under the load previously mentioned. Looking at the performance 
monitors on the server shows that we are not CPU bound (it hovers <15%). 

The only thing I can assume, is that reading of the hgweb.config each time 
sequentially is causing the server to not be able to serve the requests 
fast enough. Because of this, our users are experiencing problems when 
accessing their repositories.

Our setup and performance testing has confirmed this to be the case when 
hosted in IIS 7.5 with both hgwebdir.cgi and ISAPI-WSGI hosting.
Comment 1 Matt Mackall 2010-05-13 12:30 UTC
I suspect reading the config file is not a problem: the config parser is
pretty straightforward and there's not much per-repo work done at that level.
There's also pre-existing logic to only refresh the config every 20 seconds
or so.

But if you have a page that generates a repo index, that will visit each
repository and could be quite time-consuming.
Comment 2 Matt Hawley 2010-05-13 12:38 UTC
We block the repo index page, so I don't think that is the case. What was 
very interesting, is the consistent repro of 404s when RPS is constant and 
we do a push. At that point, IIS (or CGI/ISAPI) returns 404s for the 
duration of the push, then returns to normal.
Comment 3 Matt Mackall 2010-05-13 13:08 UTC
Hmm, that's sounding like an IIS issue (or possibly an issue in your WSGI
adapter). Push operates by uploading a bundle via multipart POST, then
unbundling. So the bulk of it happens without any sort of locking. The
latter half (unpacking the bundle) only takes write locks on the target
repository - readers have no locks.
Comment 4 Matt Hawley 2010-05-14 10:04 UTC
The problem is exacerbated when doing a push. With enough load, simple 
pulls or even browsing hgweb starts to fail with 404s.

What can I do to help identify if this is a hgweb / mercurial problem over 
IIS? I don't have access to a Linux box, but would it be possible for 
someone to do some rudimentary RPS tests with a large number of repos?
Comment 5 Matt Mackall 2010-05-14 11:35 UTC
Ok, I've reproduced this with 5000 repos on my laptop.

Method:
- create a one cset repo in a
- for i in `seq 5000`; do cp -a a $i; done
- echo "[paths]" > hgweb.config
- for i in `seq 5000`; do echo "$i = $PWD/$i" >> hgweb.config
- add sys.write.stderr() messages to mercurial/hgweb/hgwebdir_mod.py
- set up stock Apache with hgweb.wsgi
- fire off multiple windows doing:
  for i in `seq 5000`; do wget -O /dev/null http://localhost/$i; done
- watch logs

The issue is that when hgweb refreshes its config every 20 seconds or so,
there's a window where the config is only partially read. During that
window, repos can 404. I'm not sure what the threading model is here, but
I've shrunk the race considerably by using some temporary variables in the
refresh code. Now I can't hit it anymore.

Please test this patch:

diff -r 39e7f14a8286 mercurial/hgweb/hgwebdir_mod.py
--- a/mercurial/hgweb/hgwebdir_mod.py	Fri May 14 10:01:09 2010 -0500
+++ b/mercurial/hgweb/hgwebdir_mod.py	Fri May 14 12:35:25 2010 -0500
@@ -56,21 +56,33 @@
             return
 
         if self.baseui:
-            self.ui = self.baseui.copy()
+            u = self.baseui.copy()
         else:
-            self.ui = ui.ui()
-            self.ui.setconfig('ui', 'report_untrusted', 'off')
-            self.ui.setconfig('ui', 'interactive', 'off')
+            u = ui.ui()
+            u.setconfig('ui', 'report_untrusted', 'off')
+            u.setconfig('ui', 'interactive', 'off')
 
         if not isinstance(self.conf, (dict, list, tuple)):
             map = {'paths': 'hgweb-paths'}
-            self.ui.readconfig(self.conf, remap=map, trust=True)
-            paths = self.ui.configitems('hgweb-paths')
+            u.readconfig(self.conf, remap=map, trust=True)
+            paths = u.configitems('hgweb-paths')
         elif isinstance(self.conf, (list, tuple)):
             paths = self.conf
         elif isinstance(self.conf, dict):
             paths = self.conf.items()
 
+        repos = findrepos(paths)
+        for prefix, root in u.configitems('collections'):
+            prefix = util.pconvert(prefix)
+            for path in util.walkrepos(root, followsym=True):
+                repo = os.path.normpath(path)
+                name = util.pconvert(repo)
+                if name.startswith(prefix):
+                    name = name[len(prefix):]
+                repos.append((name.lstrip('/'), repo))
+
+        self.repos = repos
+        self.ui = u
         encoding.encoding = self.ui.config('web', 'encoding',
                                            encoding.encoding)
         self.style = self.ui.config('web', 'style', 'paper')
@@ -78,17 +90,6 @@
         if self.stripecount:
             self.stripecount = int(self.stripecount)
         self._baseurl = self.ui.config('web', 'baseurl')
-
-        self.repos = findrepos(paths)
-        for prefix, root in self.ui.configitems('collections'):
-            prefix = util.pconvert(prefix)
-            for path in util.walkrepos(root, followsym=True):
-                repo = os.path.normpath(path)
-                name = util.pconvert(repo)
-                if name.startswith(prefix):
-                    name = name[len(prefix):]
-                self.repos.append((name.lstrip('/'), repo))
-
         self.lastrefresh = time.time()
 
     def run(self):
Comment 6 Matt Hawley 2010-05-14 14:42 UTC
Thanks Matt, I've applied the patch and our perf team will do another 
round of testing. Will let you know when that is finished.
Comment 7 HG Bot 2010-05-15 22:00 UTC
Fixed by http://hg.intevation.org/mercurial/crew/rev/ed5d2a7c4b73
(hgweb: fix race in refreshing repo list (issue2188))
Comment 8 Matt Hawley 2010-05-21 12:27 UTC
We finished our perf testings, and we did not expereience the 404s with 
the patch applied.
Comment 9 HG Bot 2010-05-31 14:00 UTC
Fixed by http://hg.intevation.org/mercurial/crew/rev/99bc18d1ab0f
(hgweb: fix race in refreshing repo list (issue2188))
Comment 10 Bugzilla 2012-05-12 09:09 UTC

--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:09 EDT  ---

This bug was previously known as _bug_ 2188 at http://mercurial.selenic.com/bts/issue2188