Native support for lz4?

Gregory Szorc gregory.szorc at gmail.com
Fri Aug 5 17:58:14 EDT 2016


On Fri, Aug 5, 2016 at 2:03 PM, Gregory Szorc <gregory.szorc at gmail.com>
wrote:

> On Fri, Aug 5, 2016 at 12:20 PM, Siddharth Agarwal <sid at less-broken.com>
> wrote:
>
>> On 8/5/16 12:09, Augie Fackler wrote:
>>
>>> On Fri, Aug 05, 2016 at 10:48:03AM -0700, Gregory Szorc wrote:
>>>
>>>> Facebook introduced an lz4revlog extension a while ago. I think lz4 has
>>>> some compelling performance advantages over zlib for revlog storage and
>>>> wire protocol compression.
>>>>
>>>> I'd like to start a discussion about bundling the lz4 C implementation
>>>> as
>>>> part of the Mercurial distribution and supporting lz4 for revlogs and
>>>> wire
>>>> protocol compression out of the box.
>>>>
>>>> I'm not proposing requiring lz4 or making lz4 the default. I mostly care
>>>> about making lz4 accessible to more users. (The 3rd party lz4revlog
>>>> extension is difficult to use because you need a separate Python package
>>>> providing lz4 support. Plus, lz4revlog isn't using the proper lz4
>>>> framing
>>>> encoding and I'm hesitant to recommend its use because of this.)
>>>>
>>> Yes, we should definitely not use the existing python-lz4 in hg itself
>>> - the one-off framing format makes me sad.
>>>
>>
>> Agreed.
>>
>>
>>> I'd also entertain scope bloating the conversation to including other
>>>> compression formats. Once you support 2, you need to support N, right?
>>>> I've
>>>> been taking an interest in zstd and I'd be curious if Facebook, others
>>>> have
>>>> any plans to add support to Mercurial.
>>>>
>>> I've been meaning to at least squint at this, but lack the round
>>> tuits. I'm definitely open to this line of inquiry in general,
>>> including the idea of bundling lz4 or adding better hooks for it in
>>> core.
>>>
>>
>> We may want to wait for zstd. It's just plain better than gzip on every
>> axis, but from what I gather it's *extremely* close to being ready.
>>
>> I agree that going from 1->2 is harder than going from 2->N, but we
>> really must avoid recompressing on pulls. It's not clear to me how that
>> would work in a world where users can pick between Mercurial repositories
>> compressed with any of lz4, zstd or gzip.
>
>
>
> Yes, avoiding excessive decompression/compression on the server would be
> important. But consider how poorly we currently do things.
>
> Today, when you request a bundle on the server, the server first obtains a
> changegroup. The changegroup contains a series of mdiff.textdiff's for all
> the changelog, manifest, and filelog data. These are obtained by
> decompressing full text revisions from the revlog and generating a new
> mdiff (there is no fastpath to reuse deltas from the revlogs AFAICT). The
> changegroup is stuffed into a "bundle" container and the resulting stream
> of bits gets zlib compressed by the HTTP protocol and stays as uncompressed
> over SSH (we defer compression to the SSH protocol). So, we're already
> incurring a zlib decompress + compress for bundle retrieval on the server
> today. We could certainly optimize this, but doing {lz4, zstd, zlib} ->
> {lz4, zstd, zlib} on the server in the future would be no worse than zlib
> -> zlib today.
>

Things aren't as bad as I claimed. There is a fast path that avoids
computing the 2 fulltexts and an mdiff for every revlog revision in the
changegroup. That code is in revlog.revdiff(). However, that returns the
decompressed revlog chunk. So we still effectively roundtrip through zlib
on the server as part of serving pulls. We could potentially avoid the
decompress. But it would require yet another changegroup version.



> If you want to trade disk space for CPU time, we could potentially run
> side-by-side stores with N representations of data in different compression
> formats. I think Durham's proposed generic store API could facilitate that.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160805/dc6efc5e/attachment.html>


More information about the Mercurial-devel mailing list