Key-value storage [was "add a new packed repository extension"]

Mon Jul 9 17:07:34 CDT 2012

On Thu, 2012-07-05 at 21:45 -0700, Bryan O'Sullivan wrote:
> On Thu, Jul 5, 2012 at 2:41 PM, Matt Mackall <mpm at selenic.com> wrote:
>          
>         I would expect a workload where you're going to actually
>         _visit_ 50k revlogs in a pack (ie a very large update or
>         initial
>         checkout) probably takes much more than a second.
> 
> 
> That's true. But I don't like losing those precious CPU cycles.
> 
> 
> I did make some measurements, to sate my curiosity, writing and
> reading a single table with 1.37 million filenames as keys, and the
> metadata I'd need for packed repos (offset/size pairs encoded using
> struct) as values.
> 
> 
> My home-cooked pure Python database manages 188 thousand inserts per
> second, and 74 thousand lookups per second (I managed to bum 50% more
> performance out of it).
> 
> 
> cdb can do 910 thousand inserts per second, and 1302 thousand lookups
> per second. The index is about 10% bigger on disk than the Python
> index, presumably due to it being intended for general purpose use.

It's not clear from your responses here whether you've implemented my
simple suggestion (bisecting a sorted list) or the cdb approach (list +
256 hash tables). The latter is bigger, of course.

-- 
Mathematics is the supreme nostalgia of our time.