RFC: removing thread safety from localrepository class

Matt Mackall mpm at selenic.com
Mon Aug 10 15:29:08 CDT 2015


On Sun, 2015-08-02 at 14:39 -0700, Gregory Szorc wrote:
> Currently, localrepository is theoretically thread safe. There is various
> code that jumps through hoops to try to provide this guarantee (see
> revlog.revision()).
> 
> Thread safety is hard. There are various bugs around hgweb not being
> threadsafe (often seems to involve files like caches that are outside
> locks). And, I bet there are many more bugs waiting to be filed.
> Maintaining thread safety on a large code base is cognitively difficult.
> For Mercurial, it's an additional layer of data isolation that must be
> considered when writing any code (there are already locks to prevent
> inter-process contention).
> 
> The CPython GIL significantly limits performance benefits of multi-threaded
> code when CPU intensive Python code is executed. You can get significant
> benefits in some scenarios (e.g. using threads for reading and writing to
> sockets). But I argue the benefits to Mercurial are slim to none.
> 
> Given the complexity of maintaining thread safety of localrepository and
> the limited benefits of multi-threaded Mercurial, I'd like to propose that
> we drop support for thread safety of localrepository.
> 
> We can still support multi-threaded code in server processes, but it will
> use separate localrepository instances for each concurrent request. This
> may drop performance of servers in this configuration a little bit. But
> let's be honest: if you want good Mercurial performance, you should be
> running multiple processes - not threads - on your server.

I think it's not a little bit for lots of users. If I recall correctly,
WSGI and processes on Windows with Apache or IIS is not even an
available option (due to lack of a fork()-like operation). If we're
being honest, you should of course flush Windows from your IT
operations, but we're not being honest, we're being practical.

> Dropping thread safety of localrepository will open up the door for things
> like more aggressive caching of revlog content, which is the impetus for
> this request. For example, when adding revlog entries, we need to flush and
> reopen the revlog if we want to read just-written data. I'd love to make
> this code perform better. Doing so requires a more robust caching layer.
> And implementing a thread safe cache will be a lot of work. I'd rather not
> have that requirement.
> 
> Can we drop thread safety from localrepository instances?

I can see doing this if we have an instance pool so that we don't have
to pay the cost of instantiating a repo on EVERY request.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list