Key-value storage [was "add a new packed repository extension"]

Bryan O'Sullivan bos at serpentine.com
Thu Jul 5 12:25:34 CDT 2012


On Wed, Jul 4, 2012 at 8:56 AM, Matt Mackall <mpm at selenic.com> wrote:

>
> Wait, what? Why would we need to update the index?


Laziness. If index lookups were faster than trying to open a file, I'd have
preferred to do that first. As things stand, the complexity involved makes
it not worth trying to do that.

For a read-only index, the simplest answer is probably bisecting the
> mmap of a sorted table.


I wrote some code that does this, but being pure Python, it's unacceptably
slow. It can write about 230,000 records per second, which isn't bad, but
it reads only 50,000 per second. That's not shabby for an interpreted
language, but it puts it into "noticeable bottleneck" territory.

There are a couple of things I can do with this knowledge:

   - Use something like cdb directly, and accept that if someone wants the
   extension, they have to install a third party package. (Why not one of
   Python's built-in key-value stores? Not a single one of them is portable
   across a reasonable range of Python versions and platforms. Amazing, but
   true.)
   - Write a C extension for handling these indices*.* I don't think
   there's a need for a Python fallback, because nobody who cares about this
   performance corner case is going to be running pure Python.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120705/d6cd5246/attachment.html>


More information about the Mercurial-devel mailing list