[PATCH 0 of 1 ] Wrong distance calculation in revlog causes huge manifests

Wed Jul 11 18:19:43 CDT 2012

On Wed, 2012-07-11 at 12:54 +0200, Friedrich Kastner-Masilko wrote:
> Hy there,
> 
> recently I had the opportunity (or the need, depends how you see it) to convert a recent Linux kernel
> repository from Git to Mercurial via Hg-Git. I did so on our company server with plenty of RAM and cores to
> spare, so I didn't worry about size or duration. The Mercurial version is 2.0.1, Hg-Git is 0.3.2 .
> 
> The Git repository has a total size of about 650MB (without working copy, of course). After about 8 hours
> conversion with Hg-Git, the Mercurial repository had a size of 5GB. It was interesting to see that the
> manifest data file alone topped out at 4.1GB, the data directory itself is a reasonable 960MB. All this
> without generaldelta format.
> 
> I then converted the resulting HG repo via "hg clone -U --config format.generaldelta=1 kernel kernel2" to the
> generaldelta format, in the hope of seeing a major reduction in manifest size. Unfortunately, it still created
> a 3.4GB repository, with manifest data again dominating with 2.4GB and data directory at 883MB.
> 
> I decided to take a look at the manifest index, and noticed very frequent full snapshots, despite the changes
> leading to this snapshots being minimal. All the occurances used a far away parent, though, therefore I took a
> look into the code to see what's going on. I found the distance calculation in revlog's
> _addrevision.builddelta(rev) to be wrong.
> 
> After applying the attached patch, another conversion to generaldelta format resulted in a 1.7GB repository
> with 691MB manifest and 867MB data. Now this overhead to Git's aggressive pack mechanism I would have
> expected, anyway.
> 
> Unfortunately, I did not time the conversions, so it could very well be that the patch introduced a serious
> time cost. I doubt it, because it follows the same algorithm that chainbase uses and is fairly
> straightforward.

If I understand what you've done here, this is probably incorrect.

In the non-generaldelta code, we bound both I/O and reconstruction time
based on a small multiple of file size. Before this patch, we strictly
bounded I/O, and thus reconstruction time. After this patch,
reconstruction time is bound, but I/O is not.

Note that we intentionally do _a single read_ to pull in the entire span
of deltas, including data we're skipping. This is important for
performance on spinning media: reads are blocking and you have to wait
for every read to complete before issuing another one. So if we're
reconstructing a large delta chain, we could end up making hundreds to
thousands of small round trips to the disk instead of one big, fast one.

So, with an unbounded I/O window, we could end up reading in a gigabyte
of deltas (in one read) and then discarding most of them, just to
reconstruct a megabyte of manifest data.

It probably makes sense to have different bounding factors on I/O and
CPU time and possibly to allow some form of scatter-gather, but given
that it's not there yet, this patch is a problem.

By contrast, git just mmaps its pack file and relies on the I/O
scheduler to do the right thing. This sort-of works, by virtue of
Linux's adaptive and aggressive read-ahead engine, but can degrade to
worst-case random I/O behavior when the spacing between deltas in a
chain is large. Also, it's safe to assume that mmap read-ahead works
significantly less well on non-Linux systems.

-- 
Mathematics is the supreme nostalgia of our time.