xrevlog: experimental reimplementation of revlog in C

Greg Ward greg at gerg.ca
Tue Nov 9 17:18:48 CST 2010


On Tue, Nov 9, 2010 at 10:59 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Tue, 2010-11-09 at 09:25 -0500, Greg Ward wrote:
>> Hi folks --
>>
>> for ages now, it has really bugged me that Mercurial reads the entire
>> changelog index into memory for almost any command, and that it
>> creates a Python list *and* dictionary for it.  On our main repo at
>> work, that's two 112,000-element containers every time you run (say)
>> "hg tip".
>
> Performance numbers?

It's been several weeks, so this is from memory.  What I did is write
the same simple programs twice, once in Python with the Mercurial API
and once in C using xrevlog.  dumpindex is equivalent to "hg
debugindex" and dumpdata to "hg debugdata".

ISTR that the xrevlog-based C programs are about 20x faster than the
Python code.  That jumped to ~100x when I compiled with -O2, but since
then I have switched to using function pointers for homebrew OO, which
probably makes life harder for the optimizer.

It's harder to assess memory usage.  Ideas are welcome.  But I made a
point of *not* reading the entire index into memory when there is a
separate .d file; after all, in that case it's trivial to mmap() the
index file and read just the necessary records.

For inline data I have to parse the whole index, but there probably
aren't too many records in a revlog with inline data.  My goal is to
reduce the amount of time opening the changelog, and if the changelog
is big enough for this to be worth it, then there is a separate .d
file.

Greg


More information about the Mercurial-devel mailing list