[PATCH 2 of 2] [WIP] parsers: use base-16 trie for faster node->rev mapping

Mon Mar 26 20:34:40 CDT 2012

On 26 March 2012, Cc mercurial-devel at selenic.com said:
> On 22 March 2012, Bryan O'Sullivan said:
> > This vastly speeds up node->rev lookups: "hg --time log" of rev
> > 1000 on a linux-2.6 repo improves from 0.27 seconds to 0.08.
> 
> Strange. I've applied both of your patches and done extensive
> performance testing with a large repo (128k revisions) on my old, slow
> laptop ... and I'm seeing no change at all. Nothing. Not a sausage.
> Bugger all.
[...]
> 
> I'll try again in the morning... I'm probably missing something big
> and obvious.

...or in 15 minutes. Memo to myself: shell functions ignore shell
aliases. When you want to be damn sure you know which "hg" you are
running, modify PATH. That's the only way. OK, that and use the full
path to the script, but that's boring.

Anyways, I got some results this time. I measured elapsed time and
peak memory with 

  /usr/bin/time -f "%E s elapsed, %M kB max"

on each command. For each test, I ran it 5 times and reported the
shortest runtime along with the peak memory from that run. (Sometimes
that was the smallest peak memory, sometimes not. The variance in peak
memory usage was pretty small though.)

Results:

test 1: hg tip
  unpatched: 0:00.89 s elapsed, 39304 kB max
  patched:   0:00.58 s elapsed, 21360 kB max

test 2: hg log -v -r 50000 --style custom.style
  [my custom.style uses {file_adds}, {file_mods}, and {file_dels}, 
   forcing manifest reads]
  unpatched: 0:01.48 s elapsed, 68788 kB max
  patched:   0:00.93 s elapsed, 37684 kB max

test 3: hg log -v -r 2978b0d97fcf --style custom.style
  [same changeset as -r 50000]
  unpatched: 0:01.68 s elapsed, 68832 kB max
  patched:   0:01.25 s elapsed, 37744 kB max

test 4: hg log -v -r 2978b0d97fcfa2d5e6a251120772c8ff6b50f6f8 --style custom.style
  [same again, this time no need for prefix lookup]
  unpatched: 0:01.48 s elapsed, 68792 kB max
  patched:   0:00.93 s elapsed, 37680 kB max

test 5: hg diff -c 128362
  [big merge:  2525 files changed, 143973 insertions(+), 85114 deletions(-)]
  unpatched: 0:16.00 s elapsed, 78644 kB max
  patched:   0:17.15 s elapsed, 63884 kB max

Interesting that the big diff got slower. On reflection, that makes
sense: we read two changelog entries, two manifest entries, and 3475
filelog entries. (That changeset modified 837 files, added 1171, and
removed 630, which I figure means 837*2 + 1171 + 630 = 3475 filelog
revs to read.) filelogs are usually small and probably not helped very
much by lazy index parsing -- you found a 1% penalty. I think we're
seeing another manifestation of that penalty here.

        Greg