xrevlog: experimental reimplementation of revlog in C

Greg Ward greg at gerg.ca
Fri Nov 12 09:07:23 CST 2010


On Wed, Nov 10, 2010 at 10:29 PM, Matt Mackall <mpm at selenic.com> wrote:
>> So I think that custom C data structure should serve both purposes.
>> The interesting part is mapping node IDs to revnums, which I have
>> avoided doing in xrevlog so far.  (Basically it needs a hashtable
>> where the hash function is identity.  Seems silly to spend effort
>> hashing SHA-1 hashes.)
>
> Indeed. My vision is getting the whole index into memory, then providing
> a set of accessors like parentrevs() to get at it without ever
> completely unpacking it.

Yep: that's where I use mmap(), to get the whole index into memory.

> You did the zlib bits too?

Yes.  I couldn't find a great way to do it though: zlib expects the
caller to know how big the uncompressed data is going to be, but
revlog only stores the size of the *final* data after deltas have been
applied.  So I have a nasty heuristic hack that takes a guess at how
big of a buffer to allocate, and retries with larger buffers if that
turns out not to be enough.  Ick.  If you have a clone of xrevlog, see
decompress() in lib/revlog.c.  It smells funny, but it seems to work.

> Hmm, mmap(). Kernel hacker hat on: the conventional wisdom about mmap()
> being the fast way to do things is about 15 years out of date. mmap
> means mussing about with page tables to avoid copying data from kernel
> space to user space, and on modern CPUs (read Pentium and later) that
> actually turns out to be _slower than copying the pages in question_ due
> to all the cache flushing that's involved in doing page table
> manipulations. Worse, since you're not directing the I/O, you're much
> more likely to get seeky I/O and miss read-ahead opportunities.

No kidding!  Well, I'm pretty sure that my itch to play around with
mmap() has been kicking around my brain for at least 10 years now.
I'm glad I finally got it out of my system, but crushed to hear that I
missed the boat.  Sigh.

At least now that I have a tool for performance testing, I can compare
my current mmap() implementation with something else.

BTW, my assumption was that using mmap() would largely avoid seeky I/O
because the data is read in by the kernel in 4k blocks, and then I can
jump around willy-nilly and it's about as expensive as any other array
lookup in C.  (Is there a cost to accesing kernel memory from user
space?)

Seems to me like there are three ways to do it:

1) when caller wants a record, lseek() to that point, read() 64 bytes,
and parse them
2) read the entire index into memory unparsed, and parse it as needed
3) read blocks of the index into memory as needed (try to reduce I/O
and memory use)

#1 sounds awful: lots of system calls, lots of seeks, lots of small reads.
#2 is what mmap() makes it look like, so would mean less change to my
code.  But it wastes memory.
#3 is my naive understanding of what mmap() does for me under the hood.

Recommendations?  My inclination is #2 for inline data and #3 for
external data.  (I don't think #3 makes much sense for inline data.)

>> Anyone know a good hashtable library for C?  ;-)
>
> There's a thin hashtable implementation in bdiff.c that should be plenty
> adequate. Hashes should be very nicely distributed, but you'll probably
> still need 2n buckets to avoid excessive collisions. Fortunately, each
> bucket can simply contain a pointer to the index entry, no need to store
> the key in two places.

Ahh, thanks.  Part of me (the stupid, idealistic part) was actually
looking forward to cracking open my undergrad data structures book and
writing a little hashtable for fun.  My practical side now has an
excuse not to do that.

Greg


More information about the Mercurial-devel mailing list