Key-value storage [was "add a new packed repository extension"]

Bryan O'Sullivan bos at serpentine.com
Thu Jul 5 23:45:05 CDT 2012


On Thu, Jul 5, 2012 at 2:41 PM, Matt Mackall <mpm at selenic.com> wrote:

>
>
I would expect a workload where you're going to actually
> _visit_ 50k revlogs in a pack (ie a very large update or initial
> checkout) probably takes much more than a second.


That's true. But I don't like losing those precious CPU cycles.

I did make some measurements, to sate my curiosity, writing and reading a
single table with 1.37 million filenames as keys, and the metadata I'd need
for packed repos (offset/size pairs encoded using struct) as values.

My home-cooked pure Python database manages 188 thousand inserts per
second, and 74 thousand lookups per second (I managed to bum 50% more
performance out of it).

cdb can do 910 thousand inserts per second, and 1302 thousand lookups per
second. The index is about 10% bigger on disk than the Python index,
presumably due to it being intended for general purpose use.

I prefer the cdb performance numbers (who wouldn't?), but the Python is
probably acceptable for now, it doesn't introduce an external dependency,
and it will be easy to write a C extension that should perform better than
cdb, should the need arise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120705/b6c874d4/attachment.html>


More information about the Mercurial-devel mailing list