[PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

Sat Apr 1 18:31:59 EDT 2017

# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1491085720 25200
#      Sat Apr 01 15:28:40 2017 -0700
# Node ID c171ff452d36d4caeab1887df4da43e45a61022a
# Parent  7f469db35d4a303d53b4c7dc141a8378fa67d9e1
util: support multi-threaded zstandard compression for bundles

python-zstandard 0.8.0 exposes support for multi-threaded compression.
This seems like a useful feature to make available to operations
that perform streaming compression.

This patch implements a bundlespec parameter for the zstandard
compression engine that defines how many threads to use for
compression. By default, absence of the parameter retains the
existing behavior of using a single thread. Although, I could
make a compelling argument to have it use as many threads as the
machine has cores by default (because why wouldn't you want the
operation to complete as fast as possible). That being said,
there are some subtle differences with how multi-threaded
compression works. So leaving it as a single thread is a
reasonable default for the initial feature.

On the mozilla-unified repo on my i7-6700K, `hg bundle -a` times
for various bundlespecs are as follows:

! zstd-v2;compressionlevel=3
121s wall; 119s user; 1,082,202,822 bytes
! zstd-v2;compressionlevel=3;compressionthreads=4
105s wall; 117s user; 1,088,479,719 bytes
! zstd-v2;compressionlevel=15
492s wall; 489s user; 899,048,902 bytes
! zstd-v2;compressionlevel=15;compressionthreads=4
151s wall; 620s user; 905,597,146 bytes

We see multi-threaded compression doesn't really help much for level=3.
This is because Mercurial can't deliver data to zstandard fast enough.
But for level=15, we see a significant slowdown for single-threaded
operation compared to level=3 while multi-threaded compression claws
back most of the wall time difference (by throwing more CPU cores at
the problem) while preserving most of the ~180 MB size reduction. Not
bad.

The zstandard format is such that the consumer doesn't know nor care
that the producer used multi-threaded compression. There are no
significant performance implications on the client from using
multi-threaded compression. The only difference is the "deltas" within
the compressed stream are slightly different due to the way the
compressor splits up work across threads. That's why the size of
multi-threaded compressed data is typically a little larger.

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -3616,6 +3616,16 @@ class _zstdengine(compressionengine):
         decompression. Compression levels higher than ``19`` can require
         hundreds of megabytes of memory and may exhaust memory in 32-bit
         processes.
+
+        The ``compressionthreads`` parameters defines the number of threads
+        to use for compression. The default is to use a single thread.
+        A negative value will attempt to detect the number of available CPU
+        cores and use that many threads. For many compression levels,
+        Mercurial won't be able to send data to the compressor fast enough
+        for multi-threaded compression to make compression faster. The use
+        of multi-threaded compression can be combined with a higher
+        ``compressionlevel`` to achieve better compression ratios while
+        still completing in a reasonable amount of time.
         """
         return 'zstd', 'ZS'
 
@@ -3638,6 +3648,16 @@ class _zstdengine(compressionengine):
 
             opts['level'] = level
 
+        if 'compressionthreads' in params:
+            relevant.add('compressionthreads')
+            try:
+                threads = int(params['compressionthreads'])
+            except ValueError:
+                raise ValueError(_('compressionthreads "%s" is not an '
+                                   'integer') % params['compressionthreads'])
+
+            opts['threads'] = threads
+
         return relevant, opts
 
     def wireprotosupport(self):
@@ -3652,9 +3672,10 @@ class _zstdengine(compressionengine):
         # while providing no worse compression. It strikes a good balance
         # between speed and compression.
         level = opts.get('level', 3)
+        threads = opts.get('threads', 0)
 
         zstd = self._module
-        z = zstd.ZstdCompressor(level=level).compressobj()
+        z = zstd.ZstdCompressor(level=level, threads=threads).compressobj()
         for chunk in it:
             data = z.compress(chunk)
             if data:
diff --git a/tests/test-bundle-type.t b/tests/test-bundle-type.t
--- a/tests/test-bundle-type.t
+++ b/tests/test-bundle-type.t
@@ -238,6 +238,19 @@ compressionlevel parameter is parsed and
   bundle2-output-bundle: "HG20", (1 params) 1 parts total
   bundle2-output-part: "changegroup" (params: 1 mandatory 1 advisory) streamed payload
 
+multi-threaded bundlespec parameter is parsed and used
+(we can't easily test this has an effect on the output)
+
+  $ hg --debug -R b1 bundle -a -t 'zstd-v2;compressionthreads=2' zstd-mt
+  bundle2-input-part: total payload size 503
+  1 changesets found
+  list of changesets:
+  c35a0f9217e65d1fdb90c936ffa7dbe679f83ddf
+  using compression engine zstd
+  engine option: threads=2
+  bundle2-output-bundle: "HG20", (1 params) 1 parts total
+  bundle2-output-part: "changegroup" (params: 1 mandatory 1 advisory) streamed payload
+
 #else
 
 zstd is a valid engine but isn't available