Testing very long delta chains

Gregory Szorc gregory.szorc at gmail.com
Wed Dec 23 20:33:00 CST 2015


On Wed, Dec 23, 2015 at 1:59 PM, Matt Mackall <mpm at selenic.com> wrote:

> On Tue, 2015-12-22 at 23:30 -0800, Gregory Szorc wrote:
> > On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm at selenic.com> wrote:
> >
> > > On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> > > > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
> > > with a
> > > > link to
> > > >
> > >
> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressiveme
> > > rg
> > > > edeltas,
> > > > which is a generaldelta clone of mozilla-central with
> > > > format.aggressivemergedeltas enabled.
> > > >
> > > > The last manifest delta chain in this repo is over 45,000 entries
> deep
> > > and
> > > > it makes for a good benchmark for testing revlog reading performance.
> > > >
> > > > Remember: `hg clone --uncompressed` to preserve the delta chains
> from the
> > > > server or your client will recompute them as part of applying the
> > > > changegroup.
> > >
> > > Without my threaded zlib hack:
> > >
> > > $ hg perfmanifest 277045
> > > ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
> > >
> > > (25% CPU usage on a CPU with 4 threads)
> > >
> > > With my threaded zlib hack (threads = 4):
> > >
> > > $ hg perfmanifest 277045
> > > ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
> > > (best of 20)
> > >
> > > (50% CPU usage on a CPU with 4 threads)
> > >
> >
> > Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
> > think 100ms should be the upper limit.
>
> That's not a particularly comfortable limit given:
>
> $ hg debugdata -m 277045 | gzip -9 > a.gz
> $ time gunzip < a.gz > /dev/null
>
> real    0m0.142s
> user    0m0.140s
> sys     0m0.000s
>
> That's only decompressing 4MB:
>
> $ wc a.gz
>   16267   89037 4110122 a.gz
>
> (and is inherently hard to multithread)
>
> But Mercurial wants to store chains up to 2x the uncompressed size:
>
> $ gunzip < a.gz | wc
>  130845  130854 12868485
>
> So even with threading, that leaves very little room to achieve decent
> compression, which very much depends on deltas.
>
> > From C, this will not be fun because Windows.
>
> Simple worker threads on Windows aren't all that painful.
>
> > Half serious question: what are your thoughts on writing this in Rust?
>
> Sanity check: Rust isn't even in Debian-unstable yet


Apparently its in Debian testing. If nothing else, Firefox shipping a Rust
component should be a forcing function to get distributions to offer Rust.


> and we have an important
> platform where getting a working C compiler is still a headache.
>

Is this Windows?

Everyone's observations about the immaturity of Rust's packaging situation
are accurate. That being said, I argue that Rust's distribution situation
is *simpler* than Python's because there is no language runtime dependency
(Python). Yes, you still have shared library dependency issues, but that's
true of Python C extensions today.

Binary distribution of Mercurial *should* be a solved problem on Windows
and OS X, especially now that it looks like we can generate wheels properly
(I need to talk to someone about uploading wheels to PyPI for the 3.7
release).

Binary distribution on Unixen is more difficult. We can partially solve
that by publishing RPMs, debs, etc where needed. (I argue we should be
doing more of this since distros are lethargic about updating to the latest
Mercurial release.) We already have mechanisms to produce RPMs and debs
compatible with ancient distros (like CentOS 6). We even bundle Python 2.7
in some of them! I'd really prefer to stay out of the packaging game too.
But if distros are going to move at a glacial pace, I can argue we have a
responsibility to our users to provide them the opportunity to easily
install a modern Mercurial. I fear that means providing binary packages for
Unixen.

Source distribution on Unixen is just a PITA, both for Python C extensions
and Rust. I agree that Rust is behind Python here. This should change as
Rust's popularity increases. But it will take a while.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151223/b3230924/attachment.html>


More information about the Mercurial-devel mailing list