[Bug 5482] New: revlog._chunks wastes IO for discontiguous delta chains

mercurial-bugs at mercurial-scm.org mercurial-bugs at mercurial-scm.org
Thu Feb 9 08:44:03 UTC 2017


https://bz.mercurial-scm.org/show_bug.cgi?id=5482

            Bug ID: 5482
           Summary: revlog._chunks wastes IO for discontiguous delta
                    chains
           Product: Mercurial
           Version: default branch
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: bug
          Priority: normal
         Component: Mercurial
          Assignee: bugzilla at mercurial-scm.org
          Reporter: gabor.stefanik at nng.com
                CC: mercurial-devel at selenic.com

revlog._chunks always converts its incoming parameter revs into a single
(start, end) pair for passing to _chunkraw. This is/was fine in
non-generaldelta revlogs since delta chains there are guaranteed to be
contiguous.

However, generaldelta allows delta chains to be discontiguous. In this case,
reading a contiguous range wastes IO. To avoid wasting too much IO, we
currently limit delta chains to span no more than 4 compressed full revisions'
worth of data, which severely hurts compression in highly branched repositories
(see bug 5480).

We should instead split up revs into contiguous subranges and _chunkraw them
individually, or at least (if perf reasons require) skip "large" gaps between
ranges. I'd suggest defining "large" as more than C * (compresseddeltalen /
(len(subranges)-1)), where C is a config option with a default value to be
determined experimentally. C needs to be configurable because the optimal C
will likely be different for an HDD vs an SSD - indeed, C=0 is probably best
for SSDs.
Again, it would be best if we could just skip all gaps, but I don't know how
that will affect performance.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list