[Bug 5482] New: revlog._chunks wastes IO for discontiguous delta chains
mercurial-bugs at mercurial-scm.org
mercurial-bugs at mercurial-scm.org
Thu Feb 9 08:44:03 UTC 2017
https://bz.mercurial-scm.org/show_bug.cgi?id=5482
Bug ID: 5482
Summary: revlog._chunks wastes IO for discontiguous delta
chains
Product: Mercurial
Version: default branch
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: bug
Priority: normal
Component: Mercurial
Assignee: bugzilla at mercurial-scm.org
Reporter: gabor.stefanik at nng.com
CC: mercurial-devel at selenic.com
revlog._chunks always converts its incoming parameter revs into a single
(start, end) pair for passing to _chunkraw. This is/was fine in
non-generaldelta revlogs since delta chains there are guaranteed to be
contiguous.
However, generaldelta allows delta chains to be discontiguous. In this case,
reading a contiguous range wastes IO. To avoid wasting too much IO, we
currently limit delta chains to span no more than 4 compressed full revisions'
worth of data, which severely hurts compression in highly branched repositories
(see bug 5480).
We should instead split up revs into contiguous subranges and _chunkraw them
individually, or at least (if perf reasons require) skip "large" gaps between
ranges. I'd suggest defining "large" as more than C * (compresseddeltalen /
(len(subranges)-1)), where C is a config option with a default value to be
determined experimentally. C needs to be configurable because the optimal C
will likely be different for an HDD vs an SSD - indeed, C=0 is probably best
for SSDs.
Again, it would be best if we could just skip all gaps, but I don't know how
that will affect performance.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Mercurial-devel
mailing list