Testing very long delta chains

Matt Mackall mpm at selenic.com
Wed Dec 23 15:59:59 CST 2015


On Tue, 2015-12-22 at 23:30 -0800, Gregory Szorc wrote:
> On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm at selenic.com> wrote:
> 
> > On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> > > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
> > with a
> > > link to
> > > 
> > https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressiveme
> > rg
> > > edeltas,
> > > which is a generaldelta clone of mozilla-central with
> > > format.aggressivemergedeltas enabled.
> > > 
> > > The last manifest delta chain in this repo is over 45,000 entries deep
> > and
> > > it makes for a good benchmark for testing revlog reading performance.
> > > 
> > > Remember: `hg clone --uncompressed` to preserve the delta chains from the
> > > server or your client will recompute them as part of applying the
> > > changegroup.
> > 
> > Without my threaded zlib hack:
> > 
> > $ hg perfmanifest 277045
> > ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
> > 
> > (25% CPU usage on a CPU with 4 threads)
> > 
> > With my threaded zlib hack (threads = 4):
> > 
> > $ hg perfmanifest 277045
> > ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
> > (best of 20)
> > 
> > (50% CPU usage on a CPU with 4 threads)
> > 
> 
> Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
> think 100ms should be the upper limit.

That's not a particularly comfortable limit given:

$ hg debugdata -m 277045 | gzip -9 > a.gz
$ time gunzip < a.gz > /dev/null

real	0m0.142s
user	0m0.140s
sys	0m0.000s

That's only decompressing 4MB:

$ wc a.gz
  16267   89037 4110122 a.gz

(and is inherently hard to multithread)

But Mercurial wants to store chains up to 2x the uncompressed size:

$ gunzip < a.gz | wc
 130845  130854 12868485

So even with threading, that leaves very little room to achieve decent
compression, which very much depends on deltas.

> From C, this will not be fun because Windows.

Simple worker threads on Windows aren't all that painful.

> Half serious question: what are your thoughts on writing this in Rust?

Sanity check: Rust isn't even in Debian-unstable yet and we have an important
platform where getting a working C compiler is still a headache.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list