[PATCH] repair: compress bundles with gzip instead of bzip2 (BC)

Augie Fackler raf at durin42.com
Mon May 15 13:39:11 EDT 2017


On Sat, May 13, 2017 at 11:33:27PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1494743508 25200
> #      Sat May 13 23:31:48 2017 -0700
> # Node ID 03c14dca1ff875ed9f4650166ada89baa02c5843
> # Parent  78496ac300255e9996b3e282086661afc08af37c
> repair: compress bundles with gzip instead of bzip2 (BC)

Can you give me an idea of how much bigger the bundles are on disk?
Most users never clean these up.

> repair.strip() produces up to 2 bundles: a persisted "backup"
> bundle (containing stripped data) and a temporary bundle
> (containing data that will be re-applied post strip). The
> temporary bundle is uncompressed and the backup bundle is
> compressed with bzip2.

Should that temporary bundle be hit with zstd if available? It's
scoped to a single execution (more or less), so it ought to be safe
and a nearly-free IO win for minimal CPU.

>
> I imagine bzip2 was chosen as the compression format for backup
> bundles because it yields the best compression. That makes sense:
> you don't want backup bundles wasting space. But at the same time,
> you want operations to complete fast as well.
>
> bzip2 is likely 3-5x slower than zlib/gip. If you are compressing
> a large amount of data, that speed difference can translate to
> seconds. It can also translate to lost megabytes. However,
> storage space gets cheaper every year and single core CPU
> performance isn't scaling as well as it was 10 years ago when
> this default was likely chosen. I think we should go with the
> faster compression algorithm at the expense of larger bundles.
>
> I *would* like to change the default to zstd (if available). However,
> that's a significant BC break and I don't feel comfortable changing
> it just yet. When we have official zstd revlog support and the
> repo is using it, I think it makes sense to use zstd for backup
> bundles. But for zlib-based revlogs, it's safer to avoid zstd
> for the time being.
>
> This change does make some `hg strip` operations faster if large
> amounts of data are being compressed. However, execution time is
> often dominated by creating and applying the temporary bundle, so
> the benefit likely isn't night and day.
>
> diff --git a/mercurial/repair.py b/mercurial/repair.py
> --- a/mercurial/repair.py
> +++ b/mercurial/repair.py
> @@ -43,9 +43,9 @@ def _bundle(repo, bases, heads, node, su
>      if cgversion != '01':
>          bundletype = "HG20"
>          if compress:
> -            comp = 'BZ'
> +            comp = 'GZ'
>      elif compress:
> -        bundletype = "HG10BZ"
> +        bundletype = "HG10GZ"
>      else:
>          bundletype = "HG10UN"
>      return bundle2.writebundle(repo.ui, cg, name, bundletype, vfs,
> diff --git a/tests/test-generaldelta.t b/tests/test-generaldelta.t
> --- a/tests/test-generaldelta.t
> +++ b/tests/test-generaldelta.t
> @@ -154,7 +154,7 @@ Test that strip bundle use bundle2
>    0 files updated, 0 files merged, 5 files removed, 0 files unresolved
>    saved backup bundle to $TESTTMP/aggressive/.hg/strip-backup/1c5d4dc9a8b8-6c68e60c-backup.hg (glob)
>    $ hg debugbundle .hg/strip-backup/*
> -  Stream params: sortdict([('Compression', 'BZ')])
> +  Stream params: sortdict([('Compression', 'GZ')])
>    changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])"
>        1c5d4dc9a8b8d6e1750966d343e94db665e7a1e9
>
> diff --git a/tests/test-strip.t b/tests/test-strip.t
> --- a/tests/test-strip.t
> +++ b/tests/test-strip.t
> @@ -210,7 +210,7 @@
>    summary:     b
>
>    $ hg debugbundle .hg/strip-backup/*
> -  Stream params: sortdict([('Compression', 'BZ')])
> +  Stream params: sortdict([('Compression', 'GZ')])
>    changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])"
>        264128213d290d868c54642d13aeaa3675551a78
>    $ hg pull .hg/strip-backup/*
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list