We have a very large internal repository with 30000+ files in the tip, 100+ active branches with regular merges between them, and almost 60000 revisions. Last year, we undertook a large refactoring of our code that caused _all_ files in the repository to be moved to new locations (the entire code tree was rearranged into a logical scheme). Some older maintenance branches have branched out before the refactor, so they still have the old layout; we still have to regularly merge from these branches into our main trunk, which is refactored. Since approximately the start of this refactor, our manifest file has started growing out of control, similar to how it behaved prior to generaldelta. Enabling aggressivemergedeltas doesn't appear to help - it shrinks the manifest from 1.3GB to 1GB, which is still far higher than the ~100MB we had around rev 43000 Looking at the revlog shows that around that time, the average delta chain length dropped from the low thousands to the low double digits. Many revisions since then are stored as old-style (r-1) deltas instead of being deltified against a parent. The culprit appears to be the "dest" check in _isgooddelta, which often goes above textlen*4 due to intervening revisions from other branches, and causes otherwise good deltas to be discarded. According to a comment in the code, the "dest" check is supposed to provide an upper bound to the I/O needed to read a revision later, but it's calculated using an algorithm that gives a reasonable upper bound only with the old (r-1) delta format. With generaldelta, it erroneously counts unrelated revisions in between revisions of the delta chain, and gives an unreasonably high upper bound. Thus, many good deltas are rejected, and the resulting suboptimal storage further spreads out the delta chains in the revlog, causing yet more "dest" false positives, and thus even worse compression. Eventually, performance degrades to that before generaldelta. A proof-of-concept fix (simply dropping the dest check) allows the manifest to shrink to ~150MB, and further down to ~60MB with aggressivemergedeltas enabled.
Oops, my bad. "Dist" check, not "dest" check.
This would cause a perf regression unless bug 5482 is fixed.
Yes, _isgooddelta() is wrong by counting the delta chain length by assuming r-1 [when using generaldelta]. I can see how this would blow up the size of a revlog with many heads by introducing more delta chains. Note that delta chain computation can take ~1ms and that can add up when doing it tens of thousands of times (such as when applying changegroups). You'll need to cache the delta chain length like is done for the base revision in chainbase() or performance will suffer. You can probably change the base cache to a (baserev, length) tuple instead of just an int, rename chainbase(), and change all the plumbing. Send a patch and I'll look at it.
We are already measuring also the correct delta chain length, it's the 'compresseddeltalen' check. We limit that to 2*plaintext length. The comment states that 'dist' was kept (at 4*plaintext length) because it bounds I/O. This is because we always read delta chains as a single contiguous block, which is extremely wasteful for "sparse" delta chains (e.g. on rarely-written branches of a highly active repository) - see bug 5482, which I've marked as blocking this bug. If you feel like we can live with the extra I/O load, I'll drop the dependency on 5482 and send a patch dropping the 'dist' check.
Created attachment 1948 [details] Proof-of-concept fix as an extension Attached the proof-of-concept fix. Loading this as an extension in 4.1, and then running debugupgraderepo with redeltabase compresses our manifest file from 1.3GB to 31MB. Unfortunately it hurts IO because of bug 5482. BTW, mpm earlier WONTFIXed what appears to be the same as 5482, as bug 4961. His argument was "disk span is bounded anyway, so no more than 4x overread will happen". Looks like we need to reevaluate this stance, as the disk span bound is actually severely hurting manifest compression in some instances.
Looks confirmed by indygreg. If you want more eyes than Greg's on your patch, consider mailing the patch to the list.
It's not a patch at this point, it's packaged as an extension. Also, it's probably unacceptable in its current form, since it completely drops the disk span check and exposes bug 5482 in full force. Ideally, we would completely drop the disk span limit, and find a way to manage reads that balances overread, computation and I/O request costs.
For people interested, I've seen a couple of repository at hand showing this kind of issue. I've been looking into this issue this week.
(In reply to Pierre-Yves David from comment #8) Does the problem go away on your repositories if you run a redeltamultibase with my extension? Hopefully I will soon have some time to integrate our fix back, but 5482 may require a decision from the Steering Committee on how to go forward (go with straight scatter/gather reads and accept the performance penalty on Windows, keep reading contiguous regions and accept the read amplification hazard, or something in between).
I've submitted a mitigation for stable. https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-June/100425.html We needs a more deep rework on the delta code and heuristic in core, I'll create a wiki page for this soon.
Fixed by https://mercurial-scm.org/repo/hg/rev/e9d325cfe071 Pierre-Yves David <pierre-yves.david@octobus.net> revlog: add an experimental option to mitigated delta issues (issue5480) The general delta heuristic to select a delta do not scale with the number of branch. The delta base is frequently too far away to be able to reuse a chain according to the "distance" criteria. This leads to insertion of larger delta (or even full text) that themselves push the bases for the next delta further away leading to more large deltas and full texts. This full text and frequent recomputation throw Mercurial performance in disarray. For example of a slightly large repository 280 000 files (2 150 000 versions) 430 000 changesets (10 000 topological heads) Number below compares repository with and without the distance criteria: manifest size: with: 21.4 GB without: 0.3 GB store size: with: 28.7 GB without 7.4 GB bundle last 15 00 revisions: with: 800 seconds 971 MB without: 50 seconds 73 MB unbundle time (of the last 15K revisions): with: 1150 seconds (~19 minutes) without: 35 seconds Similar issues has been observed in other repositories. Adding a new option or "feature" on stable is uncommon. However, given that this issues is making Mercurial practically unusable, I'm exceptionally targeting this patch for stable. What is actually needed is a full rework of the delta building and reading logic. However, that will be a longer process and churn not suitable for stable. In the meantime, we introduces a quick and dirty mitigation of this in the 'experimental' config space. The new option introduces a way to set the maximum amount of memory usable to store a diff in memory. This extend the ability for Mercurial to create chains without removing all safe guard regarding memory access. The option should be phased out when core has a more proper solution available. Setting the limit to '0' remove all limits, setting it to '-1' use the default limit (textsize x 4). (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/895ecec31c70 Pierre-Yves David <pierre-yves.david@octobus.net> revlog: add an experimental option to mitigated delta issues (issue5480) The general delta heuristic to select a delta do not scale with the number of branch. The delta base is frequently too far away to be able to reuse a chain according to the "distance" criteria. This leads to insertion of larger delta (or even full text) that themselves push the bases for the next delta further away leading to more large deltas and full texts. This full text and frequent recomputation throw Mercurial performance in disarray. For example of a slightly large repository 280 000 files (2 150 000 versions) 430 000 changesets (10 000 topological heads) Number below compares repository with and without the distance criteria: manifest size: with: 21.4 GB without: 0.3 GB store size: with: 28.7 GB without 7.4 GB bundle last 15 00 revisions: with: 800 seconds 971 MB without: 50 seconds 73 MB unbundle time (of the last 15K revisions): with: 1150 seconds (~19 minutes) without: 35 seconds Similar issues has been observed in other repositories. Adding a new option or "feature" on stable is uncommon. However, given that this issues is making Mercurial practically unusable, I'm exceptionally targeting this patch for stable. What is actually needed is a full rework of the delta building and reading logic. However, that will be a longer process and churn not suitable for stable. In the meantime, we introduces a quick and dirty mitigation of this in the 'experimental' config space. The new option introduces a way to set the maximum amount of memory usable to store a diff in memory. This extend the ability for Mercurial to create chains without removing all safe guard regarding memory access. The option should be phased out when core has a more proper solution available. Setting the limit to '0' remove all limits, setting it to '-1' use the default limit (textsize x 4). (please test the fix)
Bug was set to TESTING for 10 days, resolving
The issue is not fix. We just have a mitigation option.
Bug was inactive for 150 days, archiving
Bad bot.
Gábor Stefanik, did you try the new `experimental.maxdeltachainspan` option to see if it's improving your situation? You can try it with `hg debugupgraderepo -o redeltaall --run --config experimental.maxdeltachainspan=0`. We are still working on a solution that can be on by default but this workaround has proved to fix the issue on big repositories like yours.
Fixed by https://mercurial-scm.org/repo/hg/rev/f8762ea73e0d Paul Morelle <paul.morelle@octobus.net> sparse-revlog: implement algorithm to write sparse delta chains (issue5480) The classic behavior of revlog._isgooddeltainfo is to consider the span size of the whole delta chain, and limit it to 4 * textlen. Once sparse-revlog writing is allowed (and enforced with a requirement), revlog._isgooddeltainfo considers the span of the largest chunk as the distance used in the verification, instead of using the span of the whole delta chain. In order to compute the span of the largest chunk, we need to slice into chunks a chain with the new revision at the top of the revlog, and take the maximal span of these chunks. The sparse read density is a parameter to the slicing, as it will stop when the global read density reaches this threshold. For instance, a density of 50% means that 2 of 4 read bytes are actually used for the reconstruction of the revision (the others are part of other chains). This allows a new revision to be potentially stored with a diff against another revision anywhere in the history, instead of forcing it in the last 4 * textlen. The result is a much better compression on repositories that have many concurrent branches. Here are a comparison between using deltas from current upstream (aggressive-merge-deltas on by default) and deltas from a sparse-revlog Comparison of `.hg/store/` size: mercurial (6.74% merges): before: 46,831,873 bytes after: 46,795,992 bytes (no relevant change) pypy (8.30% merges): before: 333,524,651 bytes after: 308,417,511 bytes -8% netbeans (34.21% merges): before: 1,141,847,554 bytes after: 1,131,093,161 bytes -1% mozilla-central (4.84% merges): before: 2,344,248,850 bytes after: 2,328,459,258 bytes -1% large-private-repo-A (merge 19.73%) before: 41,510,550,163 bytes after: 8,121,763,428 bytes -80% large-private-repo-B (23.77%) before: 58,702,221,709 bytes after: 8,351,588,828 bytes -76% Comparison of `00manifest.d` size: mercurial (6.74% merges): before: 6,143,044 bytes after: 6,107,163 bytes pypy (8.30% merges): before: 52,941,780 bytes after: 27,834,082 bytes -48% netbeans (34.21% merges): before: 130,088,982 bytes after: 119,337,636 bytes -10% mozilla-central (4.84% merges): before: 215,096,339 bytes after: 199,496,863 bytes -8% large-private-repo-A (merge 19.73%) before: 33,725,285,081 bytes after: 390,302,545 bytes -99% large-private-repo-B (23.77%) before: 49,457,701,645 bytes after: 1,366,752,187 bytes -97% The better delta chains provide a performance boost in relevant repositories: pypy, bundling 1000 revisions: before: 1.670s after: 1.149s -31% Unbundling got a bit slower. probably because the sparse algorithm is still pure python. pypy, unbundling 1000 revisions: before: 4.062s after: 4.507s +10% Performance of bundle/unbundle in repository with few concurrent branches (eg: mercurial) are unaffected. No significant differences have been noticed then timing `hg push` and `hg pull` locally. More state timings are being gathered. Same as for aggressive-merge-delta, better delta comes with longer delta chains. Longer chains have a performance impact. For example. The length of the chain needed to get the manifest of pypy's tip moves from 82 item to 1929 items. This moves the restore time from 3.88ms to 11.3ms. Delta chain length is an independent issue that affects repository without this changes. It will be dealt with independently. No significant differences have been observed on repositories where `sparse-revlog` have not much effect (mercurial, unity, netbeans). On pypy, small differences have been observed on some operation affected by delta chain building and retrieval. pypy, perfmanifest before: 0.006162s after: 0.017899s +190% pypy, commit: before: 0.382 after: 0.376 -1% pypy, status: before: 0.157 after: 0.168 +7% More comprehensive and stable timing comparisons are in progress. (please test the fix)
Bug was set to TESTING for 7 days, resolving