[PATCH 2 of 2] [WIP] parsers: use base-16 trie for faster node->rev mapping

Bryan O'Sullivan bos at serpentine.com
Fri Apr 6 01:54:12 CDT 2012


On Thu, Mar 29, 2012 at 1:17 PM, Matt Mackall <mpm at selenic.com> wrote:

> Notably, the old Python lazy index from 1.7 is actually faster here:
>
> ! wall 0.008514 comb 0.000000 user 0.000000 sys 0.000000 (best of 331)
>
> http://www.selenic.com/hg/file/333421b9e0f9/mercurial/revlog.py#l200

...and I just figured out why.

During a search for the rev corresponding to a node:

The old Python lazy index caches at most one node->rev mapping: the match
for the desired node.

The C version caches *every* node->rev mapping while it's walking the
index, and stops when it finds a match.

Sure enough, using perf to profile a run, most of the time is spent in
nt_insert.

I tweaked perfnodelookup to load a revlog once, then benchmark the cost of
clearing the node->rev cache and looking up a node:

def perfnodelookup(ui, repo, rev):
>    import mercurial.revlog
>    mercurial.revlog._prereadsize = 2**24 # disable lazy parser in old hg
>    n = repo[rev].node()
>    cl = mercurial.revlog.revlog(repo.sopener, "00changelog.i")
>    def d():
>        cl.rev(n)
>        cl.clearcaches()
>    timer(d)


This gets us closer to measuring just the cost of a node lookup.

Results of "hg perfnodelookup 0" on a linux-2.6 tree:

Current Python cache: 161 msec
C trie that caches everything: 35.2 msec
C trie that caches only the match: 4.04 msec

These are worst-case numbers in each case, since the entire index must be
scanned before a match is found. Performance is nearly linear in the number
of revs scanned in each case.

We can't use the "only cache the match" strategy all the time, because it
loses badly for e.g. log, even compared to the Python code (~10x slower). I
think the best thing would be to use the same approach as the old Python
code: scan in toto a few times, then cache everything if more accesses
arrive. That should make both the few- and many-accesses cases fast.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120405/28a32138/attachment.html>


More information about the Mercurial-devel mailing list