Speeding up Mercurial on NFS
Martin Geisler
mg at aragost.com
Mon Jan 10 07:45:24 CST 2011
Matt Mackall <mpm at selenic.com> writes:
> On Tue, 2010-12-07 at 16:34 +0100, Martin Geisler wrote:
>> Matt Mackall <mpm at selenic.com> writes:
>>
>> > On Mon, 2010-12-06 at 08:23 -0800, mg at lazybytes.net wrote:
>> >
>> >> Two years ago, someone asked if Git could be made to run faster
>> >> over NFS. The result was a patch that made Git use threads to
>> >> preload data in parallel. This gave a 5 time speedup:
>> >>
>> >> http://kerneltrap.org/mailarchive/git/2008/11/14/4089834/thread
>> >>
>> >> I have tried to replicate these results with some simple test
>> >> programs:
>> >>
>> >> http://bitbucket.org/mg/parallelwalk
>> >>
>> >> The 'walker' program is a single-threaded C-based program that
>> >> will walk a directory tree as fast as possible, 'pywalker' is a
>> >> Python-based version of it, but with support for multiple threads.
I've updated the C program so that it can use multiple threads to walk
the directory tree. The result is at the above URL. Please forgive the
memory leaks, I traded memory for development time in the prototype...
If you are an expert in writing multi-threaded C programs, then please
take a look at these files and give me your tips:
https://bitbucket.org/mg/parallelwalk/src/tip/walker.c
https://bitbucket.org/mg/parallelwalk/src/tip/queue.h
Running the walkers on a local disk without NFS gives these results with
the OpenOffice repository (walking a 2.0 GB working copy):
threads pywalker walker
1 10.1 s 11.0 s
2 9.6 s 8.3 s
4 7.0 s 7.5 s
8 6.8 s 5.4 s
16 6.3 s 5.3 s
32 6.6 s 5.6 s
128 6.4 s 5.6 s
Each test was run three times and the disk cache was cleared before each
run in order to simulate a cold start:
timeit -q --pre 'sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"' \
'walker -t $N -q'
Running over an artificially slow (0.2 ms delay) NFS link back to
localhost gives:
threads pywalker walker
1 9.0 s 8.2 s
2 6.3 s 4.5 s
4 6.1 s 2.7 s
8 5.9 s 1.5 s
16 6.0 s 1.7 s
32 6.0 s 1.9 s
The command line I used was:
timeit -q --pre 'mount nfs' --post 'umount nfs' \
'cd nfs/openoffice; walker -t $N -q'
Here the goal was to simulate a NFS client that connects for the first
time to the server, hence the mount/umount dance.
Judging from these results, we can get a speedup factor of 6 when
comparing raw walking speed of a 1-threaded Python program with a
8-threaded C program.
This is just the raw walking speed, the real dirstate code does more
than just walking the file system, so the final speedup will be lower.
Thought on the above?
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://aragost.com/en/services/mercurial/blog/
More information about the Mercurial-devel
mailing list