[PATCH 2 of 2] hgext: add a new packed repository extension, packrepo

Fri Jul 13 15:19:01 CDT 2012

On Sat, 2012-06-30 at 14:51 -0400, Greg Ward wrote:
> On 27 June 2012, Bryan O'Sullivan said:
> > # HG changeset patch
> > # User Bryan O'Sullivan <bryano at fb.com>
> > # Date 1340818576 25200
> > # Node ID 989483d028d1881eb775815c9d66361f9bed4d06
> > # Parent  080b8d275cfe8f355fbe778a175a1c4cf1e08083
> > hgext: add a new packed repository extension, packrepo
> > 
> [...]
> > +This extension is aimed at a narrow use case: the performance of "hg
> > +clone --uncompressed", of a repository that contains many revlogs, over
> > +a fast LAN or WAN.
> > +
> > +In these cases, the performance of "hg clone --uncompressed" can be
> > +limited by the seek rate of the client's disk.  Use of this extension
> > +can improve performance by a factor of 10.  (Please be sure to measure
> > +performance in your own environment before you decide to use this
> > +extension.)
> 
> No kidding! I've been meaning to investigate this for ages now, and
> just never got around to it. Thank you! I'm totally surprised that the
> bottleneck is on the client... but considering the crappy I/O
> performance of the machines we put on every developer's desk, and the
> fact that our server is (I think) configured for pretty good I/O, I
> should not be surprised.

I don't think we've fully understood the underlying issues here. 

My working theory has been that we're thrashing the journal. That
scenario works like this:

When we write out files, at first all the data will land in cache, more
or less immediately. If we hits lots of files, the kernel will say, "ok,
time to flush some of this to disk" (ext3 flushes from cache to journal
every 5 seconds by default). And then we get a mix of a) contiguous
writes to the journal (fast) b) mostly-contiguous writes to data areas
(fast).

When we write even more files, however, we'll fill the journal and force
a journal flush, which means we'll mix in  c) contiguous reads from the
journal (fast) and d) random writes to update the filesystem metadata
(SLOW)... all interleaved with our ongoing efforts to stream files onto
the disk (SLOW).

In short, journalling sucks when the journal is smaller than your
working set. So writing a single big file bypasses that.

Under such a theory, I would expect the following three tests to yield
very similar times:

 cp -a /ramfs/bare-repo /home/repo  (baseline)
 hg clone -U /ramfs/bare-repo /home/repo
 hg serve -R /ramfs/bare-repo; hg clone http://localhost:8000 /home/repo

For the Mozilla-central repo, the first two come in at about 1m, with cp
being marginally faster. But the hg serve case is coming in at over 2m
without any sign of a CPU bottleneck.

A wget of the data stream to /dev/null takes 20s, so most of the time is
spent on the client side.

And lastly, a tar | nc & nc | tar pair takes 40s. This is a pretty good
analogy for what we're trying to do with --uncompressed and is actually
faster than a raw copy due to deserializing reading and writing, so I
have to conclude that something is broken in our client side for
--uncompressed. My hunch is that we're buffering in such a way that we
introduce pipeline stalls.

Also, the journal thrashing effect doesn't seem to be on the radar here
on a repo with 100k files.

-- 
Mathematics is the supreme nostalgia of our time.