[Bug 4961] New: generaldelta revlogs read too much data

mercurial-bugs at selenic.com mercurial-bugs at selenic.com
Mon Nov 23 01:19:50 UTC 2015


https://bz.mercurial-scm.org/show_bug.cgi?id=4961

            Bug ID: 4961
           Summary: generaldelta revlogs read too much data
           Product: Mercurial
           Version: default branch
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: bug
          Priority: urgent
         Component: Mercurial
          Assignee: bugzilla at selenic.com
          Reporter: gregory.szorc at gmail.com
                CC: mercurial-devel at selenic.com
            Blocks: 4861

generaldelta revlogs can potentially read many more bytes than are necessary to
resolve a revision.

1. delta chain revisions collected and decompressed binary deltas requested at
https://selenic.com/repo/hg/file/df9b73d2d444/mercurial/revlog.py#l1111
2. request to load the linear range of revisions at
https://selenic.com/repo/hg/file/df9b73d2d444/mercurial/revlog.py#l1016

The existing code assumes delta chains are linear in the revlog. With
generaldelta, this is no longer true. With generaldelta, your delta chain could
be something like [1000, 1001, 1002, 1500, 1501, 1502, 2000]. The code as is
would read revisions 1000-2000 as opposed to just the set of 7 revisions
comprising the delta chain.

On a conversion of mozilla-central with generaldelta but no reordering as part
of the conversion, I witnessed excessive revlog reading due to non-linear delta
chains:

(Pdb) len(revs)
43774
(Pdb) revs[0]
180911
(Pdb) revs[-1]
272220
(Pdb) p len(range(revs[0], revs[-1]))
91309
(Pdb) sum([self.length(x) for x in revs])
23,404,355
...
(Pdb) p len(data)
41,549,824

This could potentially drastically slow down the speed of commands that only
need to read a single manifest revision. Yes, the page cache should cache this
data. But still, performance will be bad on uncached data. And, you could
imagine a worst case scenario where a repo has multiple active heads and delta
chains. As the number of active heads grows, the amount of data needed to read
to resolve a manifest increases proportionally.

Fortunately, we don't decompress all read revisions unless they are necessary.


Referenced Bugs:

https://bz.mercurial-scm.org/show_bug.cgi?id=4861
[Bug 4861] Make generaldelta the default
-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list