Speeding up Mercurial on NFS

Tue Jan 11 05:11:14 CST 2011

Matt Mackall <mpm at selenic.com> writes:

> On Mon, 2011-01-10 at 14:45 +0100, Martin Geisler wrote:
>
>>   timeit -q --pre 'sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"' \
>>          'walker -t $N -q'
>
> I think dropping the cache here invalidates the comparison with the
> below. The important case for NFS is when the _server_ has a warm
> cache and doesn't need to talk to the disk for a lookup. If you do a
> warm cache test here, you'll get a target number that will be the best
> you can hope to get out of the NFS case.

Right, good point. I was mostly including the local disk numbers in
order to see how single- vs multi-threaded walking compared to one
another and to see how the C program compared to Python. So I would say
the test shows that there is little to be gained in switching to C for
such an I/O intensive program -- which was also what we would expect.

> Also, it's just not very interesting. On spinning media, we know that
> we're going to have a seek bottleneck that multithreading can only
> exacerbate. On SSD, we're either going to hit a req/s barrier that's
> lower than our syscall throughput (no scaling) or we're going to going
> to saturate the syscall interface (scaling up to number of cores), or
> we're going to saturate the filesystem lookup locks (but not on modern
> Linux with a desktop machine!). What you have above appears to be an
> SSD and two cores, yes?

I agree with you that a single disk should max out like you describe...
but the above numbers are for a normal 7200 RPM 1 TB SATA disk and a
quad core i7 930.

> By comparison, threaded NFS lookups is all about saturating the pipe
> because the (cached on server) lookup is much faster than the request
> round trip.
>
> How many files are you walking here?

There are 70k files -- it is the working copy of OpenOffice, changeset
67e476e04669. You sometimes talk about walking a repo with 207k files,
is that a public repo?

>> Running over an artificially slow (0.2 ms delay) NFS link back to
>> localhost gives:
>> 
>>   threads  pywalker  walker
>>     1       9.0 s     8.2 s
>>     2       6.3 s     4.5 s
>>     4       6.1 s     2.7 s
>>     8       5.9 s     1.5 s
>>    16       6.0 s     1.7 s
>>    32       6.0 s     1.9 s
>
> Interesting. Looks like you're getting about 6-8 parallel requests per
> round trip time. But that seems way too slow, that'd only be ~ 15k
> requests per second or 22500 - 30k files total. Or, if that .2ms is
> round-trip delay, 45k - 60k files total.

Yes, it's round-trip delay -- I add 0.1 ms delay to the link and the
ping time goes to 0.2 ms. I use

  sudo tc qdisc change dev lo root netem delay 0.1ms

to add a simple constant delay to all packets.

> You should also run this test without the delay. Again, this will give
> you a target baseline for what you can hope to get out of NFS. It
> should saturate around threads = cores, but should probably be
> marginally faster than that 1.5s number there.

Okay, here are tests without the delay -- raw speed on the local
loopback. I unmount the NFS filesystem after each test but do not clear
any other caches:

  threads  pywalker  walker
    1      2230 ms   1931 ms
    2      1857 ms   1164 ms
    4      2594 ms    818 ms
    8      2757 ms    833 ms
   16      2796 ms    991 ms
   32      2776 ms    987 ms

The eight (hyper-threading) cores were never maxed out while I ran the
tests, they only peaked up to about 50% utilization.

At first I thought this was because of how I walk the tree: each worker
thread scans a directory and inserts each subdirectory into a queue. It
then returns and grabs the next directory from the queue. This gives a
breadth-first traversal of the directory tree.

But there are 203 top-level directories in the root of the OpenOffice
working copy, so all threads will quickly get a directory to work on.
The number of directories on different levels are as follows:

  level  directories
   0      203
   1      948
   2     1611
   3     1372
   4     1144
   5     1473
   6      330
   7       71
   8       50
   9       15
  10        6
  11        2
  12        1
  13        0

I used 'ls -d */*(/) | wc -l' with varying number of '*/' to get these
numbers.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://aragost.com/en/services/mercurial/blog/