Bug 4326 - Race Condition in HgWeb?
Summary: Race Condition in HgWeb?
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: hgweb (show other bugs)
Version: 3.1
Hardware: PC Windows
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-08 18:57 UTC by sgrinwis
Modified: 2017-08-11 11:54 UTC (History)
5 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sgrinwis 2014-08-08 18:57 UTC
Apache 2.2.27 (Win64) mod_wsgi/3.5  Python 2.7.8 

Issue seems an awful lot like 3953,  however, it was fixed in 2.6, and I'm seeing this issue in 2.7.

If we sync a lot of different repos at once,  we get HTTP 500 errors being thrown, with tracebacks like this in the apache logs:

mod_wsgi (pid=460): Exception occurred processing WSGI script 'E:/webroot/hgweb.wsgi'.
Traceback (most recent call last):
File "C:\\Python27\\lib\\site-packages\\mercurial\\hgweb\\hgwebdir_mod.py", line 153, in __call__
  return self.run_wsgi(req)
File "C:\\Python27\\lib\\site-packages\\mercurial\\hgweb\\hgwebdir_mod.py", line 218, in run_wsgi
  return hgweb(repo).run_wsgi(req)
File "C:\\Python27\\lib\\site-packages\\mercurial\\hgweb\\hgweb_mod.py", line 68, in __init__
  r.baseui.setconfig('ui', 'report_untrusted', 'off', 'hgweb')
File "C:\\Python27\\lib\\site-packages\\mercurial\\ui.py", line 165, in setconfig
  cfg.set(section, name, value, source)
File "C:\\Python27\\lib\\site-packages\\mercurial\\config.py", line 64, in set
  self._data[section][item] = value
File "C:\\Python27\\lib\\site-packages\\mercurial\\util.py", line 237, in __setitem__
  self._list.remove(key)
ValueError: list.remove(x): x not in list


Or:

 mod_wsgi (pid=460): Exception occurred processing WSGI script 'E:/webroot/hgweb.wsgi'.
 Traceback (most recent call last):
   File "C:\\Python27\\lib\\site-packages\\mercurial\\hgweb\\hgwebdir_mod.py", line 153, in __call__
     return self.run_wsgi(req)
   File "C:\\Python27\\lib\\site-packages\\mercurial\\hgweb\\hgwebdir_mod.py", line 218, in run_wsgi
     return hgweb(repo).run_wsgi(req)
   File "C:\\Python27\\lib\\site-packages\\mercurial\\hgweb\\hgweb_mod.py", line 68, in __init__
     r.baseui.setconfig('ui', 'report_untrusted', 'off', 'hgweb')
   File "C:\\Python27\\lib\\site-packages\\mercurial\\ui.py", line 165, in setconfig
     cfg.set(section, name, value, source)
   File "C:\\Python27\\lib\\site-packages\\mercurial\\config.py", line 64, in set
     self._data[section][item] = value
   File "C:\\Python27\\lib\\site-packages\\mercurial\\util.py", line 237, in __setitem__
     self._list.remove(key)



We're throwing several of these a second on average.  Can reproduce at will.

Does not happen at low loads,  only when around hundred simultaneous syncs are happening do we see this issue.

Anything I can do to help, let me know.

--Steve
Comment 1 Matt Mackall 2014-08-09 16:34 UTC
#3953 was actually fixed in 2.8.2 (released Jan 1), but this does indeed seem identical. As your backtrace fingerprint actually matches 3.1, going to mark this confirmed.
Comment 2 sgrinwis 2014-08-09 23:51 UTC
Thanks for the quick response.  That's really appreciated.

Sorry about the version snafu,  got Hg and Python versions confused...  Long day at the office.

This is severely impacting a rather busy Hg server.

We typically see bursts of hundreds of repo's being synced simultaneously, and a significant fraction of them are failing now. The race condition is happening more and more often as server load increases.  

Is it appropriate to set the priority to urgent?

If I can be of any assistance, let me know.
Comment 3 HG Bot 2014-08-10 15:15 UTC
Fixed by http://selenic.com/repo/hg/rev/af62f0280a76
Matt Mackall <mpm@selenic.com>
hgweb: avoid config object race with hgwebdir (issue4326)

Turns out hgwebdir passes full repo objects to each hgweb request
instance, but with a shared baseui. We explicitly break the sharing.

(please test the fix)
Comment 4 sgrinwis 2014-08-10 18:04 UTC
Ran a test against our test server with the code change provided.

Test was 450 simultaneous syncs.  No HTTP 500 errors thown.

All fixed!  Thanks guys!  

--Steve