Key-value storage [was "add a new packed repository extension"]
Bryan O'Sullivan
bos at serpentine.com
Thu Jul 5 12:25:34 CDT 2012
On Wed, Jul 4, 2012 at 8:56 AM, Matt Mackall <mpm at selenic.com> wrote:
>
> Wait, what? Why would we need to update the index?
Laziness. If index lookups were faster than trying to open a file, I'd have
preferred to do that first. As things stand, the complexity involved makes
it not worth trying to do that.
For a read-only index, the simplest answer is probably bisecting the
> mmap of a sorted table.
I wrote some code that does this, but being pure Python, it's unacceptably
slow. It can write about 230,000 records per second, which isn't bad, but
it reads only 50,000 per second. That's not shabby for an interpreted
language, but it puts it into "noticeable bottleneck" territory.
There are a couple of things I can do with this knowledge:
- Use something like cdb directly, and accept that if someone wants the
extension, they have to install a third party package. (Why not one of
Python's built-in key-value stores? Not a single one of them is portable
across a reasonable range of Python versions and platforms. Amazing, but
true.)
- Write a C extension for handling these indices*.* I don't think
there's a need for a Python fallback, because nobody who cares about this
performance corner case is going to be running pure Python.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120705/d6cd5246/attachment.html>
More information about the Mercurial-devel
mailing list