[WIP] lazy revlog index parsing

Bryan O'Sullivan bos at serpentine.com
Thu Mar 15 02:19:54 CDT 2012


On Wed, Mar 14, 2012 at 11:50 PM, Matt Mackall <mpm at selenic.com> wrote:

>
> [I'll just note in passing the irony of the author of patchbomb using a
> pastebin (on github of all places).]
>

Indeed. I just sent a refreshed version of the patch with all of the
WIP-ness cleaned up. It passes all the tests.


> But it doesn't quite actually work for me, something is screwy with the
> offset handling that's causing a seek to get an invalid arg.


Could you try with the patch I just sent, please? Note that I just updated
it (after I sent it) to include <arpa/inet.h>, so you might need to re-add
that.

Ok, now tip is still not much faster: it goes from 0.401s to 0.320s on
> my test repo. Turns out the bulk of the time here is spent checking that
> 270 tags point to valid nodes. And doing node->rev mapping is the other
> thing that could really stand to be optimized. Disable the valid node
> check and it now takes .069s (compared to perfstartup of .025s).
>

I could synthesize a repo and look into that, or perhaps you could send me
a pointer to your carefully tweaked one?

As far as approach is concerned, the obvious thing to do is build a dict
and look it up, but that has poor performance if there are lots of revs.

The next obvious approach is to dump all the (node ID,rev) pairs into a
single big malloced buffer, sort, and binary search for lookups. That's
almost certain to be faster, but it's still a lot of work up front, so the
startup cost might be unpleasant. If it was found to matter, cost of
sorting could be amortized over multiple lookups (as the current revlog.rev
implementation does) using a gapped insertion sort, maybe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120315/c311a251/attachment.html>


More information about the Mercurial-devel mailing list