xrevlog: experimental reimplementation of revlog in C

Mon Nov 15 13:45:40 CST 2010

On Sun, 2010-11-14 at 12:23 +0100, Adrian Buehlmann wrote:
> > On our main repo at
> > work, that's two 112,000-element containers every time you run (say)
> > "hg tip".
> 
> That's 112,000 x 64 bytes = 6.7 MB index file which seems to be parsed
> quite quickly by the C code in parsers.c (I guess that's function
> parse_index there).

Not really. The index itself is 112k Python tuples. Each tuple contains
7 Python integers and one 20 byte Python string. Each Python integer is
8 bytes. And those 20 byte strings when wrapped in a Python object and
allocated individually through Python's allocator, may take 64 bytes or
more.

In fact, simply measuring with top and

b = [(0, 0, 0, 0, 0, 0, 0, "%020x" % i) for i in xrange(1000000)]

suggests that each index entry is taking about 195 bytes. That breaks
down to something like:

  8 bytes per index entry pointer
  64 bytes per tuple
  7 * 8 bytes per integer
  64 bytes per hash
  ---
  192 bytes per entry + 9 memory allocations

There's a similar story with the hash side of things. If we read the
index into memory and only parse it on demand, we can be significantly
smaller and faster.

-- 
Mathematics is the supreme nostalgia of our time.