xrevlog: experimental reimplementation of revlog in C
Greg Ward
greg at gerg.ca
Wed Nov 10 08:58:01 CST 2010
On Tue, Nov 9, 2010 at 10:59 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Tue, 2010-11-09 at 09:25 -0500, Greg Ward wrote:
>> for ages now, it has really bugged me that Mercurial reads the entire
>> changelog index into memory for almost any command, and that it
>> creates a Python list *and* dictionary for it. On our main repo at
>> work, that's two 112,000-element containers every time you run (say)
>> "hg tip".
>
> Performance numbers?
OK, I sat down tonight to do some careful performance tests, with
numbers and everything.
Executive summary: in the best case task (roughly "hg tip"), xrevlog is
~500x faster than the Mercurial API. Worst case ("hg debugdata" in a
random subset of revs in forward order), it's ~4x faster. But I got
some seg faults from xrevlog, so clearly it's not done yet; I'll have to
repeat this experiment when the bugs are fixed.
Details:
I implemented four tasks in Python and C:
def openclose(filename):
"""open the revlog and immediately close it"""
def readindex(filename):
"""open the revlog in filename and read the entire index"""
def readtip(filename):
"""open the revlog in filename and read the data for the latest revision"""
def readdata(filename):
"""open the revlog in filename and read a random 10% of the entries
(in forward order)"""
Of those, only readtip() is a meaningful approximation of something that
an everyday hg command does, namely "hg tip". You can see the two
implementations here:
http://hg.gerg.ca/xrevlog/file/tip/progs/perftest.py
http://hg.gerg.ca/xrevlog/file/tip/progs/perftest.c
(Or just clone the repo.) If something looks fishy, like I'm unfairly
comparing Python and C by doing more work in one or the other, it
probably is. I'm only human and I just knocked this stuff off 90 min
last night. ;-)
For the timings, I used the standard timeit module to do 3 repeats of 50
runs of each task. I reimplemented the same algorithm as timeit in C,
also doing 3 x 50 runs. I'll just report the minimum runtime from each
repeat; this is the runtime of all 50 iterations.
First, a medium-sized revlog: Mercurial's changelog (~12k revisions).
In Python:
openclose min 1512.3 ms
readindex min 6247.5 ms
readtip min 1552.0 ms
readdata min 7204.7 ms
In C using xrevlog:
openclose min 1.1 ms # 1300x speedup (but who cares?)
readindex min 231.7 ms # 26x speedup
readtip min 3.1 ms # 500x speedup (useful!)
readdata min 1681.4 ms # 4.3x speedup
The speedup on openclose is irrelevant because that's not a real-world
operation. Reading the data for just the tip revision is, though, so I
like that 500x speedup.
I repeated the above with Mercurial's manifest log... and got a segfault
in xrevlog. Sigh. Maybe later.
Now with a large revlog, the changelog from our big repo at work (113000
revisions):
Python/Mercurial:
openclose min 374 ms
readindex min 147624 ms
readtip min 484 ms
readdata min 105541 ms
(N.B. I only ran 5 iterations of each task this time because I got bored
waiting for 50. So I multiplied the times by 10 in order to compare
with the Mercurial changelog above.)
Odd: I have no idea why openclose and readtip are faster here than with
Mercurial's changelog. There are ~10x as many revisions and ~10x as
many bytes in this one. Is this lazyparser kicking in?
Now C/xrevlog:
openclose min 3 ms
readindex min 2024 ms
readtip min 5 ms
readdata *segfault*
Sigh. Clearly I have more work to do. It looks promising, though.
Greg
More information about the Mercurial-devel
mailing list