Testing very long delta chains

Sean Farley sean at farley.io
Wed Dec 23 22:38:24 CST 2015


Gregory Szorc <gregory.szorc at gmail.com> writes:

> On Wed, Dec 23, 2015 at 1:59 PM, Matt Mackall <mpm at selenic.com> wrote:
>
>> On Tue, 2015-12-22 at 23:30 -0800, Gregory Szorc wrote:
>> > On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm at selenic.com> wrote:
>> >
>> > > On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
>> > > > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
>> > > with a
>> > > > link to
>> > > >
>> > >
>> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressiveme
>> > > rg
>> > > > edeltas,
>> > > > which is a generaldelta clone of mozilla-central with
>> > > > format.aggressivemergedeltas enabled.
>> > > >
>> > > > The last manifest delta chain in this repo is over 45,000 entries
>> deep
>> > > and
>> > > > it makes for a good benchmark for testing revlog reading performance.
>> > > >
>> > > > Remember: `hg clone --uncompressed` to preserve the delta chains
>> from the
>> > > > server or your client will recompute them as part of applying the
>> > > > changegroup.
>> > >
>> > > Without my threaded zlib hack:
>> > >
>> > > $ hg perfmanifest 277045
>> > > ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
>> > >
>> > > (25% CPU usage on a CPU with 4 threads)
>> > >
>> > > With my threaded zlib hack (threads = 4):
>> > >
>> > > $ hg perfmanifest 277045
>> > > ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
>> > > (best of 20)
>> > >
>> > > (50% CPU usage on a CPU with 4 threads)
>> > >
>> >
>> > Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
>> > think 100ms should be the upper limit.
>>
>> That's not a particularly comfortable limit given:
>>
>> $ hg debugdata -m 277045 | gzip -9 > a.gz
>> $ time gunzip < a.gz > /dev/null
>>
>> real    0m0.142s
>> user    0m0.140s
>> sys     0m0.000s
>>
>> That's only decompressing 4MB:
>>
>> $ wc a.gz
>>   16267   89037 4110122 a.gz
>>
>> (and is inherently hard to multithread)
>>
>> But Mercurial wants to store chains up to 2x the uncompressed size:
>>
>> $ gunzip < a.gz | wc
>>  130845  130854 12868485
>>
>> So even with threading, that leaves very little room to achieve decent
>> compression, which very much depends on deltas.
>>
>> > From C, this will not be fun because Windows.
>>
>> Simple worker threads on Windows aren't all that painful.
>>
>> > Half serious question: what are your thoughts on writing this in Rust?
>>
>> Sanity check: Rust isn't even in Debian-unstable yet
>
>
> Apparently its in Debian testing. If nothing else, Firefox shipping a Rust
> component should be a forcing function to get distributions to offer Rust.
>
>
>> and we have an important
>> platform where getting a working C compiler is still a headache.
>>
>
> Is this Windows?
>
> Everyone's observations about the immaturity of Rust's packaging situation
> are accurate. That being said, I argue that Rust's distribution situation
> is *simpler* than Python's because there is no language runtime dependency
> (Python). Yes, you still have shared library dependency issues, but that's
> true of Python C extensions today.
>
> Binary distribution of Mercurial *should* be a solved problem on Windows
> and OS X, especially now that it looks like we can generate wheels properly
> (I need to talk to someone about uploading wheels to PyPI for the 3.7
> release).
>
> Binary distribution on Unixen is more difficult. We can partially solve
> that by publishing RPMs, debs, etc where needed. (I argue we should be
> doing more of this since distros are lethargic about updating to the latest
> Mercurial release.) We already have mechanisms to produce RPMs and debs
> compatible with ancient distros (like CentOS 6). We even bundle Python 2.7
> in some of them! I'd really prefer to stay out of the packaging game too.
> But if distros are going to move at a glacial pace, I can argue we have a
> responsibility to our users to provide them the opportunity to easily
> install a modern Mercurial. I fear that means providing binary packages for
> Unixen.
>
> Source distribution on Unixen is just a PITA, both for Python C extensions
> and Rust. I agree that Rust is behind Python here. This should change as
> Rust's popularity increases. But it will take a while.

That being said, can we un-bitrot Greg Ward's C code?

https://bitbucket.org/gward/xrevlog


More information about the Mercurial-devel mailing list