[PATCH 2 of 2] hgext: add a new packed repository extension, packrepo

Bryan O'Sullivan bos at serpentine.com
Mon Jul 2 18:48:52 CDT 2012


On Sat, Jun 30, 2012 at 11:51 AM, Greg Ward <greg at gerg.ca> wrote:

>
> No kidding! I've been meaning to investigate this for ages now, and
> just never got around to it. Thank you! I'm totally surprised that the
> bottleneck is on the client...


It depends on the specifics of your environment. But with CPU out of the
way (by far the biggest problem with a normal clone), there are still
plenty of other possible bottlenecks: network bandwidth, network latency,
client disk seek rate, and client disk bandwidth.

The packrepo extension assumes that network bandwidth and latency are good,
and focuses on the disk seek rate on the client.

Most revlogs are very small (< 1 disk block), and writing them out
individually is potentially very expensive, as it will usually involve a
couple of disk seeks per file. SSDs can help here because they massively
reduce the cost of seeks, but Macs and PCs manage to claw back a lot of
that advantage with their horrendously slow filesystems.

The packrepo extension writes out a single big linear file instead, which
is a substantial win. It also writes out an index so that the revlogs in
the main file can be found.

The current index format and its parser are really a proof of concept: the
parser eagerly parses the entire index, and that becomes the new
bottleneck. I'm currently trying to figure out a good on-disk format for
very fast indexing.

One stupid question just to clarify: this extension is meant to be
> used on *client* machines, right?
>

Yes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120702/0650fedf/attachment.html>


More information about the Mercurial-devel mailing list