xrevlog: experimental reimplementation of revlog in C

Greg Ward greg at gerg.ca
Tue Nov 9 08:25:51 CST 2010


Hi folks --

for ages now, it has really bugged me that Mercurial reads the entire
changelog index into memory for almost any command, and that it
creates a Python list *and* dictionary for it.  On our main repo at
work, that's two 112,000-element containers every time you run (say)
"hg tip".

So, back in the summer, I sat down to do something about it.  The
result is xrevlog, a pure C library with a read-only implementation of
revlog as it stands in Mercurial 1.6 (i.e. no support for parent delta
or lightweight copies).  Without further ado, here is the project's
README.txt file:

"""
xrevlog
=======

xrevlog is an experimental reimplementation of Mercurial's revlog format
in pure C, licensed under the same terms as Mercurial itself (GPL v2+).
The main purpose of xrevlog is to experiment with performance/overhead
improvements: by using dedicated data structures written in C, how much
can we reduce revlog's startup overhead relative to general-purpose
Python data structures?

In particular, Mercurial's current revlog implementation reads the
entire index into memory on opening a revlog, and keeps both a list and
a dictionary representing it.  For a large history, that's two large
data structures and a lot of I/O and memory allocation.

My approach with xrevlog has been to take Mercurial's general attitude
to performance and dial it up a notch: delay any I/O or computation
until it's really needed, and cache the result as long as feasible.
Additionally, the use of custom C data structures should reduce runtime
and memory footprint.

If xrevlog happens to produce a general-purpose, reusable, GPL-licensed
pure C library for reading revlogs: great!  That's not my main goal,
though.

Currently, xrevlog only knows how to read the "Revlog NG" format
introduced in Mercurial 0.9, and still the main format used up to
Mercurial 1.7.  There is no support yet for lightweight copies or parent
delta.

Possible to-do items:

  * add support for parsing changelog, manifest log, and filelog
    entries, which would make it possible to implement something
    like "hg log" or "hg cat"

  * implement automated tests to ensure that xrevlog gives the same
    results as the canonical revlog implementation in Mercurial (e.g.
    by comparing output of equivalent programs in Python and C)

  * implement lightweight copies and/or parent delta (for compatibility
    with Mercurial 1.7)

  * add write support

  * forget all of the above and just get xrevlog incorporated into
    Mercurial to reduce startup overhead for large repositories


Building
========

xrevlog requires an ANSI C99 compiler.

To build xrevlog, just run "make".  If that doesn't work on your
platform, please let me know.


Hacking
=======

Contributions are welcome!  See the to-do list above if you're looking
for ideas.  I like Mercurial's guidelines for patch submission, so
please follow them:

  http://mercurial.selenic.com/wiki/ContributingChanges
  http://mercurial.selenic.com/wiki/SuccessfulPatch

You can email trivial patches to me directly (greg at gerg.ca).  If it
looks like it requires discussion, please use mercurial-devel and be
sure to flag the patch "xrevlog", e.g.

  hg email --flag xrevlog ...

I generally follow Mercurial's own coding style, except I use 4-space
indents in C.


Support
=======

Please use the mercurial-devel list for questions about xrevlog.  See
http://mercurial.selenic.com/wiki/MailingLists for details.


API compatibility guarantee
===========================

Absolutely none whatsoever.  Code that you write against xrevlog today
might not even compile tomorrow.


Author and copyright
====================

xrevlog was written by Greg Ward, borrowing heavily from Mercurial code
by Matt Mackall and others.  (I consider it a "derived work" of
Mercurial, and therefore covered by the GPL.)

Copyright (C) 2010 Gregory P. Ward <greg at gerg.ca>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA


Availability
============

To get the latest code:

  hg clone http://hg.gerg.ca/xrevlog
"""

Please take a look and let me know what you think.


More information about the Mercurial-devel mailing list