[Bug 5480] New: Manifest grows out of control in large repository with hundreds of branches, regular merges and a refactor involving 1000s of file moves

mercurial-bugs at mercurial-scm.org mercurial-bugs at mercurial-scm.org
Thu Feb 9 06:56:24 UTC 2017


https://bz.mercurial-scm.org/show_bug.cgi?id=5480

            Bug ID: 5480
           Summary: Manifest grows out of control in large repository with
                    hundreds of branches, regular merges and a refactor
                    involving 1000s of file moves
           Product: Mercurial
           Version: default branch
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: bug
          Priority: normal
         Component: Mercurial
          Assignee: bugzilla at mercurial-scm.org
          Reporter: gabor.stefanik at nng.com
                CC: mercurial-devel at selenic.com

We have a very large internal repository with 30000+ files in the tip, 100+
active branches with regular merges between them, and almost 60000 revisions.

Last year, we undertook a large refactoring of our code that caused _all_ files
in the repository to be moved to new locations (the entire code tree was
rearranged into a logical scheme). Some older maintenance branches have
branched out before the refactor, so they still have the old layout; we still
have to regularly merge from these branches into our main trunk, which is
refactored.

Since approximately the start of this refactor, our manifest file has started
growing out of control, similar to how it behaved prior to generaldelta.
Enabling aggressivemergedeltas doesn't appear to help - it shrinks the manifest
from 1.3GB to 1GB, which is still far higher than the ~100MB we had around rev
43000

Looking at the revlog shows that around that time, the average delta chain
length dropped from the low thousands to the low double digits. Many revisions
since then are stored as old-style (r-1) deltas instead of being deltified
against a parent. The culprit appears to be the "dest" check in _isgooddelta,
which often goes above textlen*4 due to intervening revisions from other
branches, and causes otherwise good deltas to be discarded.

According to a comment in the code, the "dest" check is supposed to provide an
upper bound to the I/O needed to read a revision later, but it's calculated
using an algorithm that gives a reasonable upper bound only with the old (r-1)
delta format. With generaldelta, it erroneously counts unrelated revisions in
between revisions of the delta chain, and gives an unreasonably high upper
bound. Thus, many good deltas are rejected, and the resulting suboptimal
storage further spreads out the delta chains in the revlog, causing yet more
"dest" false positives, and thus even worse compression. Eventually,
performance degrades to that before generaldelta.

A proof-of-concept fix (simply dropping the dest check) allows the manifest to
shrink to ~150MB, and further down to ~60MB with aggressivemergedeltas enabled.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list