[PATCH 8 of 8 zstd-revlogs] [RFC] localrepo: support non-zlib compression engines in revlogs

Mike Hommey mh at glandium.org
Thu Jan 5 02:27:10 EST 2017


On Wed, Jan 04, 2017 at 11:18:21PM -0800, Gregory Szorc wrote:
> * The lz4 performance note in the commit message isn't very accurate. There
> is a small subset of operations where the zstd python bindings are as fast
> as lz4. I'll strike the comment from the next version.
> 
> * zlib has checksums built into the compression format with how it is used
> in hg today. The patches as written do not have zstd writing checksums.
> 
> * Enabling checksums in zstd appears to have a negligible impact on
> performance.
> 
> * Reusing zstd compression and decompression "contexts" can make a
> significant difference to performance. Having a reusable "compressor"
> object that allows "context" reuse should increase performance for zstd.
> 
> * For the changelog, zstd level=1 versus level=3 makes almost no difference
> on compression ratio but does speed up compression a bit. Now I'm
> considering per-revlog settings for the compressors.
> 
> * zstd compression dictionaries speed up *both* compression and
> decompression. On changelog chunks, dictionaries improve decompress
> throughput from ~180 MB/s to ~300 MB/s. That's nothing to sneeze at.
> 
> * When dictionaries are used, zstd level=1 compresses the changelog
> considerably faster than level=3. ~160 MB/s vs ~27 MB/s.
> 
> * I was going to hold off seriously investigating compression dictionaries,
> but since there are massive perf win potentials, I think it should be done
> sooner than later.

All these perf information wrt dictionaries make me wonder if there is a
corpus of non-english changesets that could be used for some different
performance measurements. It's nice that we know things are better for
english content, but version control is not exclusive to people writing
everything in english.

Mike


More information about the Mercurial-devel mailing list