[PATCH] repair: compress bundles with gzip instead of bzip2 (BC)

Gregory Szorc gregory.szorc at gmail.com
Sun May 14 02:33:27 EDT 2017

# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1494743508 25200
#      Sat May 13 23:31:48 2017 -0700
# Node ID 03c14dca1ff875ed9f4650166ada89baa02c5843
# Parent  78496ac300255e9996b3e282086661afc08af37c
repair: compress bundles with gzip instead of bzip2 (BC)

repair.strip() produces up to 2 bundles: a persisted "backup"
bundle (containing stripped data) and a temporary bundle
(containing data that will be re-applied post strip). The
temporary bundle is uncompressed and the backup bundle is
compressed with bzip2.

I imagine bzip2 was chosen as the compression format for backup
bundles because it yields the best compression. That makes sense:
you don't want backup bundles wasting space. But at the same time,
you want operations to complete fast as well.

bzip2 is likely 3-5x slower than zlib/gip. If you are compressing
a large amount of data, that speed difference can translate to
seconds. It can also translate to lost megabytes. However,
storage space gets cheaper every year and single core CPU
performance isn't scaling as well as it was 10 years ago when
this default was likely chosen. I think we should go with the
faster compression algorithm at the expense of larger bundles.

I *would* like to change the default to zstd (if available). However,
that's a significant BC break and I don't feel comfortable changing
it just yet. When we have official zstd revlog support and the
repo is using it, I think it makes sense to use zstd for backup
bundles. But for zlib-based revlogs, it's safer to avoid zstd
for the time being.

This change does make some `hg strip` operations faster if large
amounts of data are being compressed. However, execution time is
often dominated by creating and applying the temporary bundle, so
the benefit likely isn't night and day.

diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -43,9 +43,9 @@ def _bundle(repo, bases, heads, node, su
     if cgversion != '01':
         bundletype = "HG20"
         if compress:
-            comp = 'BZ'
+            comp = 'GZ'
     elif compress:
-        bundletype = "HG10BZ"
+        bundletype = "HG10GZ"
         bundletype = "HG10UN"
     return bundle2.writebundle(repo.ui, cg, name, bundletype, vfs,
diff --git a/tests/test-generaldelta.t b/tests/test-generaldelta.t
--- a/tests/test-generaldelta.t
+++ b/tests/test-generaldelta.t
@@ -154,7 +154,7 @@ Test that strip bundle use bundle2
   0 files updated, 0 files merged, 5 files removed, 0 files unresolved
   saved backup bundle to $TESTTMP/aggressive/.hg/strip-backup/1c5d4dc9a8b8-6c68e60c-backup.hg (glob)
   $ hg debugbundle .hg/strip-backup/*
-  Stream params: sortdict([('Compression', 'BZ')])
+  Stream params: sortdict([('Compression', 'GZ')])
   changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])"
diff --git a/tests/test-strip.t b/tests/test-strip.t
--- a/tests/test-strip.t
+++ b/tests/test-strip.t
@@ -210,7 +210,7 @@
   summary:     b
   $ hg debugbundle .hg/strip-backup/*
-  Stream params: sortdict([('Compression', 'BZ')])
+  Stream params: sortdict([('Compression', 'GZ')])
   changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])"
   $ hg pull .hg/strip-backup/*

More information about the Mercurial-devel mailing list