In a hosted environment that has lots of Hg repositories (CodePlex) we are running into issues where about 10-12 Requests Per Second to the server will result in 404 errors on simple operations (as well as cause pushes to fail with 500 errors). During our performance testing, we identified that setting up the hgweb.config file as [paths] / = \\repo\path\* Was very inneficient as the number of repositories grows the time taken for operations wish linearly getting worse (we were seeing about 90 seconds to pull/push with 5000 repositories). Looking at the source during that time, indicated that using the wildcard scans the file system to generate a list of repositories. Our solution for this performance problem was to use the following format in the hgweb.config file [paths] /repo1 = \\repo\path\repo1 /repo2 = \\repo\path\repo2 ... /repoN = \\repo\path\repoN However, now that our hgweb.config file has grown to a decent size (we have 1000+ repositories hosted), we start seeing 404 errors reported by hgweb under the load previously mentioned. Looking at the performance monitors on the server shows that we are not CPU bound (it hovers <15%). The only thing I can assume, is that reading of the hgweb.config each time sequentially is causing the server to not be able to serve the requests fast enough. Because of this, our users are experiencing problems when accessing their repositories. Our setup and performance testing has confirmed this to be the case when hosted in IIS 7.5 with both hgwebdir.cgi and ISAPI-WSGI hosting.
I suspect reading the config file is not a problem: the config parser is pretty straightforward and there's not much per-repo work done at that level. There's also pre-existing logic to only refresh the config every 20 seconds or so. But if you have a page that generates a repo index, that will visit each repository and could be quite time-consuming.
We block the repo index page, so I don't think that is the case. What was very interesting, is the consistent repro of 404s when RPS is constant and we do a push. At that point, IIS (or CGI/ISAPI) returns 404s for the duration of the push, then returns to normal.
Hmm, that's sounding like an IIS issue (or possibly an issue in your WSGI adapter). Push operates by uploading a bundle via multipart POST, then unbundling. So the bulk of it happens without any sort of locking. The latter half (unpacking the bundle) only takes write locks on the target repository - readers have no locks.
The problem is exacerbated when doing a push. With enough load, simple pulls or even browsing hgweb starts to fail with 404s. What can I do to help identify if this is a hgweb / mercurial problem over IIS? I don't have access to a Linux box, but would it be possible for someone to do some rudimentary RPS tests with a large number of repos?
Ok, I've reproduced this with 5000 repos on my laptop. Method: - create a one cset repo in a - for i in `seq 5000`; do cp -a a $i; done - echo "[paths]" > hgweb.config - for i in `seq 5000`; do echo "$i = $PWD/$i" >> hgweb.config - add sys.write.stderr() messages to mercurial/hgweb/hgwebdir_mod.py - set up stock Apache with hgweb.wsgi - fire off multiple windows doing: for i in `seq 5000`; do wget -O /dev/null http://localhost/$i; done - watch logs The issue is that when hgweb refreshes its config every 20 seconds or so, there's a window where the config is only partially read. During that window, repos can 404. I'm not sure what the threading model is here, but I've shrunk the race considerably by using some temporary variables in the refresh code. Now I can't hit it anymore. Please test this patch: diff -r 39e7f14a8286 mercurial/hgweb/hgwebdir_mod.py --- a/mercurial/hgweb/hgwebdir_mod.py Fri May 14 10:01:09 2010 -0500 +++ b/mercurial/hgweb/hgwebdir_mod.py Fri May 14 12:35:25 2010 -0500 @@ -56,21 +56,33 @@ return if self.baseui: - self.ui = self.baseui.copy() + u = self.baseui.copy() else: - self.ui = ui.ui() - self.ui.setconfig('ui', 'report_untrusted', 'off') - self.ui.setconfig('ui', 'interactive', 'off') + u = ui.ui() + u.setconfig('ui', 'report_untrusted', 'off') + u.setconfig('ui', 'interactive', 'off') if not isinstance(self.conf, (dict, list, tuple)): map = {'paths': 'hgweb-paths'} - self.ui.readconfig(self.conf, remap=map, trust=True) - paths = self.ui.configitems('hgweb-paths') + u.readconfig(self.conf, remap=map, trust=True) + paths = u.configitems('hgweb-paths') elif isinstance(self.conf, (list, tuple)): paths = self.conf elif isinstance(self.conf, dict): paths = self.conf.items() + repos = findrepos(paths) + for prefix, root in u.configitems('collections'): + prefix = util.pconvert(prefix) + for path in util.walkrepos(root, followsym=True): + repo = os.path.normpath(path) + name = util.pconvert(repo) + if name.startswith(prefix): + name = name[len(prefix):] + repos.append((name.lstrip('/'), repo)) + + self.repos = repos + self.ui = u encoding.encoding = self.ui.config('web', 'encoding', encoding.encoding) self.style = self.ui.config('web', 'style', 'paper') @@ -78,17 +90,6 @@ if self.stripecount: self.stripecount = int(self.stripecount) self._baseurl = self.ui.config('web', 'baseurl') - - self.repos = findrepos(paths) - for prefix, root in self.ui.configitems('collections'): - prefix = util.pconvert(prefix) - for path in util.walkrepos(root, followsym=True): - repo = os.path.normpath(path) - name = util.pconvert(repo) - if name.startswith(prefix): - name = name[len(prefix):] - self.repos.append((name.lstrip('/'), repo)) - self.lastrefresh = time.time() def run(self):
Thanks Matt, I've applied the patch and our perf team will do another round of testing. Will let you know when that is finished.
Fixed by http://hg.intevation.org/mercurial/crew/rev/ed5d2a7c4b73 (hgweb: fix race in refreshing repo list (issue2188))
We finished our perf testings, and we did not expereience the 404s with the patch applied.
Fixed by http://hg.intevation.org/mercurial/crew/rev/99bc18d1ab0f (hgweb: fix race in refreshing repo list (issue2188))
--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:09 EDT --- This bug was previously known as _bug_ 2188 at http://mercurial.selenic.com/bts/issue2188