Experiments with LZMA compression of bundles

Fri May 20 17:24:52 CDT 2011

On 2011-05-20 16:55, Matt Mackall wrote:
>On Fri, 2011-05-20 at 23:36 +0200, Sune Foldager wrote:
>> On 2011-05-20 16:00, Matt Mackall wrote:
>> >On Fri, 2011-05-20 at 22:50 +0200, Sune Foldager wrote:
>> >> On 20-05-2011 13:34, Benoit Boissinot wrote:
>> >> > Yes, experimenting is cool (for fast server side compression, with
>> >> > better compression level than zlib, snappy/zippy would be interesting
>> >> > too).
>> >>
>> >> Maybe... is it LZ-based? Then maybe liblzma can do the same with proper
>> >> parameters set. I think the key advantage with lzma despite the
>> >> increased dictionary sizes etc., is the use of arithmetical compression
>> >> in the entropy stage (like bzip1 did).
>> >
>> >I'd be surprised if that were true. bzip2's alternative to arithmetic
>> >encoding is pretty decent.
>>
>> Hm, no you mean bzip2's alternative to Lempel-Ziv, namely the Burrows-Wheeler
>> transform. After applying that, and some other stuff, bzip2 subsequently
>> uses standard Huffman encoding:
>>
>> "bzip2's ancestor bzip used arithmetic coding instead of Huffman.
>> The change was made because of a software patent restriction."
>
>No, I actually did mean the post-transform entropy encoder, but I
>remembered it being something more clever than Huffman.
>
>But my claim was not that arithmetic compression was the distinguishing
>factor between bzip/bzip2 (it is, of course), but that arithmetic coding
>is the distinguishing factor between bzip2 and lzma (which seems to be
>what you're saying above). BWT is so radically different for LZ-based
>techniques that the symbol encoder is probably secondary.

Yes, is might be [secondary]. I haven't actually seen comparisons between
bzip1 and bzip2.