[PATCH 0 of 3] Fast implemenation of parseindex in C

Mon Oct 13 21:53:51 UTC 2008

2008/10/11 Matt Mackall <mpm at selenic.com>:
> On Sat, 2008-10-11 at 19:00 +0200, Bernhard Leiner wrote:
>> Since for large indexes with no inlined data the number of entries is
>> known in advance it is possible to create the index list with the
>> correct size in advance and just use PyList_SET_ITEM (which runs in
>> O(1)) afterwards. I have now a speedsup of about 6 also for very large
>> indexes.
>
> Great news.

Back again, this time with not so good news. I'm down to a speedup of
~3 for big index files :-(

The performance impact of using PyList_SET_ITEM was only about 20%.
The rest was gained accidentally by incorrect reference counting which
resulted in less garbage collector interference afterwards [*]. I will
send the latest version of my patches to the list...

[*] Just for the record: measuring performance of a function that
returns a big data structure like this seems like a bad idea:

for x in xrange(100):
    start = time.time()
    res = functions_that_returns_a_lot_of_data()
    stop = time.time()
    timing_results.append(stop - start)

As soon as 'res' is overwritten a _lot_ of objects are freed and the
garbage collector kicks in and
delays the following time measurement. This is at least my current
theory after staring at strange timing results for too long.

--
Bernhard Leiner           http://bernh.net