Experiments with LZMA compression of bundles

Matt Mackall mpm at selenic.com
Fri May 20 16:55:31 CDT 2011


On Fri, 2011-05-20 at 23:36 +0200, Sune Foldager wrote:
> On 2011-05-20 16:00, Matt Mackall wrote:
> >On Fri, 2011-05-20 at 22:50 +0200, Sune Foldager wrote:
> >> On 20-05-2011 13:34, Benoit Boissinot wrote:
> >> > Yes, experimenting is cool (for fast server side compression, with
> >> > better compression level than zlib, snappy/zippy would be interesting
> >> > too).
> >>
> >> Maybe... is it LZ-based? Then maybe liblzma can do the same with proper
> >> parameters set. I think the key advantage with lzma despite the
> >> increased dictionary sizes etc., is the use of arithmetical compression
> >> in the entropy stage (like bzip1 did).
> >
> >I'd be surprised if that were true. bzip2's alternative to arithmetic
> >encoding is pretty decent.
> 
> Hm, no you mean bzip2's alternative to Lempel-Ziv, namely the Burrows-Wheeler
> transform. After applying that, and some other stuff, bzip2 subsequently
> uses standard Huffman encoding:
> 
> "bzip2's ancestor bzip used arithmetic coding instead of Huffman.
> The change was made because of a software patent restriction."

No, I actually did mean the post-transform entropy encoder, but I
remembered it being something more clever than Huffman.

But my claim was not that arithmetic compression was the distinguishing
factor between bzip/bzip2 (it is, of course), but that arithmetic coding
is the distinguishing factor between bzip2 and lzma (which seems to be
what you're saying above). BWT is so radically different for LZ-based
techniques that the symbol encoder is probably secondary.

> Arithmetic is known to be superior; but there could of course be
> other changes from bzip1 to bzip2.

Not any significant ones.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list