On compressing revlogs

Jesper Schmidt schmiidt at gmail.com
Mon Jun 4 18:52:51 CDT 2012


On Mon, 04 Jun 2012 23:07:27 +0200, Bryan O'Sullivan <bos at serpentine.com>  
wrote:

> Lately, I've come to target zlib as a performance bottleneck for reading
> data from revlogs. I threw together a quick hack this morning to use the
> snappy compression algorithm instead. Here's what I've found.
>
> Snappy compression is up to 15x faster than zlib (haven't seen it be less
> than 8x faster), while decompression is up to 4x faster (haven't seen it
> less than 2x faster). Of course there's a tradeoff: poorer compression
> ratios, about 1.5x larger than zlib in my tests.

I recently benchmarked a couple of different compression algorithms. I  
stumbled across one called lz4 (http://code.google.com/p/lz4/), which  
consistently outperformed Snappy both with respect to compression ratio  
and compression/decompression speed. Below are some results from a test  
run I just did on the Mercurial source tree (in-memory).

                             ratio  comp       decomp
lz4      13.1 MB -> 5.0 MB (38.0%) 313.4 MB/s 912.8 MB/s
lz4hc    13.1 MB -> 3.6 MB (27.1%)  20.7 MB/s 975.3 MB/s
snappy   13.1 MB -> 5.2 MB (39.2%) 149.4 MB/s 579.5 MB/s
zlib<1>  13.1 MB -> 3.8 MB (28.5%)  47.5 MB/s 181.3 MB/s
zlib<2>  13.1 MB -> 3.6 MB (27.2%)  42.5 MB/s 190.6 MB/s
zlib<3>  13.1 MB -> 3.4 MB (26.2%)  35.4 MB/s 191.5 MB/s
zlib<4>  13.1 MB -> 3.2 MB (24.6%)  30.8 MB/s 192.2 MB/s
zlib<5>  13.1 MB -> 3.1 MB (23.7%)  22.2 MB/s 199.6 MB/s
zlib<6>  13.1 MB -> 3.1 MB (23.3%)  16.8 MB/s 195.9 MB/s
zlib<-1> 13.1 MB -> 3.1 MB (23.3%)  16.9 MB/s 205.2 MB/s
bzip2(1) 13.1 MB -> 2.8 MB (21.4%)   8.6 MB/s  36.2 MB/s

lz4hc is a high compression variant of lz4, which might provide a better  
tradeoff in your case (write once, read many).

-- 
Jesper


More information about the Mercurial-devel mailing list