Experiments with LZMA compression of bundles

Matt Mackall mpm at selenic.com
Fri May 20 16:00:46 CDT 2011


On Fri, 2011-05-20 at 22:50 +0200, Sune Foldager wrote:
> On 20-05-2011 13:34, Benoit Boissinot wrote:
> > On Fri, May 20, 2011 at 12:20 PM, Sune Foldager<cryo at cyanite.org>  wrote:
> >> - hg bundle and hg push http:// writes/sends full bundles with proper
> >> compression.
> >>   For the http push case, a capability determines the supported types.
> >> - hg push ssh:// sends headerless bundles with no compression.
> >> - hg pull (both getbundle and changegroup(subset)) receive headerless
> >> bundles,
> >>   with zlib compression on top of it for http (and none for ssh).
> >>
> >> Is this for historic reasons? It seems unnecessarily complicated. Why
> >> wouldn't
> >> we always use headered bundles, say, and then have pull use HG10GZ; why wrap
> >> in a zlib stream and use uncompressed headerless bundles?
> >
> > Historical reasons, we "forgot" to change that when we added the
> > compression capabilities for unbundle (which is only meaningful for
> > ssh, another screw up).
> 
> You mean http? The unbundle capability is only checked for http in the 
> current code. ssh always sends the headerless uncompressed bundle 
> directly. http sends a headered bundle directly (but the bundle itself 
> will be compressed).
> 
> Would be nice, for consistency, if ssh also accepted headered bundles, 
> but we'd have to add yet another capability to signal it :-p.
> 
> >> Why does push (the unbundle remote command) send full, headered, bundles
> >> with
> >> arbitrary compression in the bundle, and no wrapper compression?
> >>
> >> Is this because pushes are often smaller than pulls? But that still doesn't
> >> address why we don't pull GZ bundles.
> >
> > Historical again, http push was implemented later. Before the
> > wireprotocol unification we didn't spot those inconsistencies as
> > easily as we do now.
> 
> Assuming GZ in bundles is equivalent to the zlib compression wrapper 
> http uses for pull, it would maybe be nice if we received headered, 
> compressed bundles, without wrapper-compression. For consistency. Then 
> ssh could do exactly the same.
> 
> >>> My guess is that the CPU usage on the server will show a much bigger
> >>> difference.
> >>
> >> Yes, for sure; lzma is a bit on the expensive side for compression. Hardware
> >> does get faster, though.
> >>
> >> Anyway, I am not proposing adding LZMA in at this time, or anything like
> >> that.
> >> It was just some experiments I thought might interest people.
> >
> > Yes, experimenting is cool (for fast server side compression, with
> > better compression level than zlib, snappy/zippy would be interesting
> > too).
> 
> Maybe... is it LZ-based? Then maybe liblzma can do the same with proper 
> parameters set. I think the key advantage with lzma despite the 
> increased dictionary sizes etc., is the use of arithmetical compression 
> in the entropy stage (like bzip1 did).

I'd be surprised if that were true. bzip2's alternative to arithmetic
encoding is pretty decent.

You should also try running rzip on an uncompressed bundle for
comparison. It's like bzip with an infinite window.


Usually when the topic of alternate compressors comes up, I point out
that adding a new dependency is painful and that having clients create
bundles that can't be read by other clients is also not great.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list