Speeding up Mercurial on NFS

Matt Mackall mpm at selenic.com
Mon Jan 10 12:19:01 CST 2011


On Mon, 2011-01-10 at 14:45 +0100, Martin Geisler wrote:
> Matt Mackall <mpm at selenic.com> writes:
> 
> > On Tue, 2010-12-07 at 16:34 +0100, Martin Geisler wrote:
> >> Matt Mackall <mpm at selenic.com> writes:
> >> 
> >> > On Mon, 2010-12-06 at 08:23 -0800, mg at lazybytes.net wrote:
> >> >
> >> >> Two years ago, someone asked if Git could be made to run faster
> >> >> over NFS. The result was a patch that made Git use threads to
> >> >> preload data in parallel. This gave a 5 time speedup:
> >> >>
> >> >>   http://kerneltrap.org/mailarchive/git/2008/11/14/4089834/thread
> >> >>
> >> >> I have tried to replicate these results with some simple test
> >> >> programs:
> >> >>
> >> >>   http://bitbucket.org/mg/parallelwalk
> >> >>
> >> >> The 'walker' program is a single-threaded C-based program that
> >> >> will walk a directory tree as fast as possible, 'pywalker' is a
> >> >> Python-based version of it, but with support for multiple threads.
> 
> I've updated the C program so that it can use multiple threads to walk
> the directory tree. The result is at the above URL. Please forgive the
> memory leaks, I traded memory for development time in the prototype...
> 
> If you are an expert in writing multi-threaded C programs, then please
> take a look at these files and give me your tips:
> 
>   https://bitbucket.org/mg/parallelwalk/src/tip/walker.c
>   https://bitbucket.org/mg/parallelwalk/src/tip/queue.h
> 
> 
> Running the walkers on a local disk without NFS gives these results with
> the OpenOffice repository (walking a 2.0 GB working copy):
> 
>   threads  pywalker  walker
>     1      10.1 s    11.0 s
>     2       9.6 s     8.3 s
>     4       7.0 s     7.5 s
>     8       6.8 s     5.4 s
>    16       6.3 s     5.3 s
>    32       6.6 s     5.6 s
>   128       6.4 s     5.6 s

> Each test was run three times and the disk cache was cleared before each
> run in order to simulate a cold start:
> 
>   timeit -q --pre 'sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"' \
>          'walker -t $N -q'

I think dropping the cache here invalidates the comparison with the
below. The important case for NFS is when the _server_ has a warm cache
and doesn't need to talk to the disk for a lookup. If you do a warm
cache test here, you'll get a target number that will be the best you
can hope to get out of the NFS case.

Also, it's just not very interesting. On spinning media, we know that
we're going to have a seek bottleneck that multithreading can only
exacerbate. On SSD, we're either going to hit a req/s barrier that's
lower than our syscall throughput (no scaling) or we're going to going
to saturate the syscall interface (scaling up to number of cores), or
we're going to saturate the filesystem lookup locks (but not on modern
Linux with a desktop machine!). What you have above appears to be an SSD
and two cores, yes?

By comparison, threaded NFS lookups is all about saturating the pipe
because the (cached on server) lookup is much faster than the request
round trip.

How many files are you walking here?

> Running over an artificially slow (0.2 ms delay) NFS link back to
> localhost gives:
> 
>   threads  pywalker  walker
>     1       9.0 s     8.2 s
>     2       6.3 s     4.5 s
>     4       6.1 s     2.7 s
>     8       5.9 s     1.5 s
>    16       6.0 s     1.7 s
>    32       6.0 s     1.9 s

Interesting. Looks like you're getting about 6-8 parallel requests per
round trip time. But that seems way too slow, that'd only be ~ 15k
requests per second or 22500 - 30k files total. Or, if that .2ms is
round-trip delay, 45k - 60k files total.

You should also run this test without the delay. Again, this will give
you a target baseline for what you can hope to get out of NFS. It should
saturate around threads = cores, but should probably be marginally
faster than that 1.5s number there.


-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list