On compressing revlogs

Mon Jun 25 16:38:14 CDT 2012

On Mon, Jun 25, 2012 at 4:11 PM, Isaac Jurado <diptongo at gmail.com> wrote:

>
> But growing the repository size slows down operations when the OS
> page-cache is cold, doesn't it?

No. Some constants to keep in mind:

   - Seek throughput: ~300 half-stroke seeks/sec
   - Linear read throughput on a slow modern disk: 80 MB/sec
   - zlib decompression throughput: 30 MB/sec
   - lz4 decompression throughput: 1100 MB/sec
   - sha1 throughput: 285 MB/sec

Now I'm going to do some back-of-the-envelope arithmetic. You're welcome to
quibble with the details, but that would be missing the point.

Suppose we need to update 10,000 files, each containing 1K of compressed
fulltext.

With warm caches, we can read and decompress at about 30 MB/sec using
normal hg, so to a first approximation we're only paying for zlib and sha1:
0.3 seconds for zlib, 0.03 for sha1. With lz4, decompression time drops to
0.008 seconds, so sha1 becomes our bottleneck.

With cold caches, every file access requires a seek and a read: 33.3
seconds for seeks, 0.51 seconds for reads (these files are really 4KB,
thanks to filesystem block size rounding), plus the numbers above:
33.3+0.51+0.3+0.03 = 34.14 seconds in total.

Switch to lz4 for disk storage, so that the space grows by the worst case
I've seen: 30%. Now with a cold cache we haven't changed the number of
seeks, and reads still us 0.51 seconds, because our files are still smaller
than 4KB blocks. But our decompression is way faster, so now we pay a total
of 33.3+0.51+0.008+0.03 = 33.85 seconds.

In other words, we're storing more data on disk, but cold-cache performance
can still actually improve, albeit imperceptibly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120625/ecd3df91/attachment.html>