Experiments with LZMA compression of bundles

Sune Foldager cryo at cyanite.org
Fri May 20 05:20:50 CDT 2011


On 2011-05-20 10:17, Benoit Boissinot wrote:
>On Fri, May 20, 2011 at 9:20 AM, Sune Foldager <cryo at cyanite.org> wrote:
>> Just for fun, I hacked in LZMA compression support in Mercurial's bundle
>> system, so I could test how much compression that might give us. I include
>> the results below.
>>
>> I used pyliblzma, which in turn depends on liblzma2 (used by xz). I used
>> default settings, which, as I understand, use a rather large amount of
>> memory, but this can be tuned. Hacking it into Mercurial was quite simple,
>> although the various systems handling bundles is disseminated throughout
>> several modules, so I only made it work for on-disk bundles.
>
>Note that the vast majority of bundle operation are over http/ssh, in
>that case bzip2 won't be used, so the fair comparison is against zlib.

It's a bit messy, really, as far as I can see:

- hg bundle and hg push http:// writes/sends full bundles with proper compression.
   For the http push case, a capability determines the supported types.
- hg push ssh:// sends headerless bundles with no compression.
- hg pull (both getbundle and changegroup(subset)) receive headerless bundles,
   with zlib compression on top of it for http (and none for ssh).

Is this for historic reasons? It seems unnecessarily complicated. Why wouldn't
we always use headered bundles, say, and then have pull use HG10GZ; why wrap
in a zlib stream and use uncompressed headerless bundles?

and...

Why does push (the unbundle remote command) send full, headered, bundles with
arbitrary compression in the bundle, and no wrapper compression?

Is this because pushes are often smaller than pulls? But that still doesn't
address why we don't pull GZ bundles.

Nomenclature is also a bit inconsistent, with several related but non-identical
things called "bundle" (with or without header) and "changegroup" (either a set
of changes to a single revlog, or a headerless bundle).

>My guess is that the CPU usage on the server will show a much bigger difference.

Yes, for sure; lzma is a bit on the expensive side for compression. Hardware
does get faster, though.

Anyway, I am not proposing adding LZMA in at this time, or anything like that.
It was just some experiments I thought might interest people.

/Sune


More information about the Mercurial-devel mailing list