[PATCH] Experiment: On demand calculation of lasttag & distance instead of pre-calculation

Sat Aug 29 03:05:12 CDT 2009

On Sat August 29 2009 03:28:48 Mads Kiilerich wrote:
> # HG changeset patch
> # User Gilles Moris <gilles.moris at free.fr>
> # Date 1247087563 -7200
> # Node ID 7e9e9b38720584274927cc0c0ebcbdf4ff502357
> # Parent  7345fa5e572e029362bc6f2f96af452607ebb1ff
> Experiment: On demand calculation of lasttag & distance instead of pre-calculation
> 
> This is an experiment of tweaking Gilles algorithm to get good performance in
> more cases.
> 
> Computing lasttag for N tags with pre-calculation is O(N + repo-size), but for
> practical purposes it can be considered O(repo-size). Pretty good for large N
> but bad for N=1.
> 
> By using the mapping as a cache and populate it on demand we can make single
> lookup O(longest-tag-distance) while keeping an upper bound for N lookups on
> O(N + repo-size). For practical purposes with 1 << N << repo-size and
> well-placed tags it can perhaps be considered O(N).
> 
> Some quick measurements on the Mercurial repo:
> 
> Pre-calculation:
> [mk at localhost hg]$ time ./hg parent --template '{rev}: {lasttag}+{lasttagdistance}\n'
> 9406: 1.3.1+135
> 
> real	0m0.598s
> user	0m0.357s
> sys	0m0.068s
> 
> [mk at localhost hg]$ time ./hg log --template '{rev}: {lasttag}+{lasttagdistance}\n' > /dev/null
> 
> real	0m3.606s
> user	0m3.099s
> sys	0m0.112s
> 
> On demand:
> 
> [mk at localhost hg]$ time ./hg parent --template '{rev}: {lasttag}+{lasttagdistance}\n'
> 9406: 1.3.1+135
> 
> real	0m0.399s
> user	0m0.149s
> sys	0m0.071s
> 
> [mk at localhost hg]$ time ./hg log --template '{rev}: {lasttag}+{lasttagdistance}\n' > /dev/null
> 
> real	0m3.977s
> user	0m3.414s
> sys	0m0.125s
> 
> Reference:
> 
> [mk at localhost hg]$ time ./hg parent --template '{rev}:\n'
> 9406:
> 
> real	0m0.347s
> user	0m0.089s
> sys	0m0.080s
> 
> [mk at localhost hg]$ time ./hg log --template '{rev}:\n' > /dev/null
> 
> real	0m2.909s
> user	0m2.434s
> sys	0m0.117s
> 
> I assume that testing on bigger repos will show that "on demand" wins much more
> significant for single lookups. "On demand" is probably slower for many
> lookups, and it is probably possible to optimize it to be less slower, but how
> much does that matter? How often will N be close to repo-size for big repos?
> 

On Netbeans 6.7, pre-caculation:

% time ./hg -R ../netbeans_release67_fixes parent --template '{rev}: {lasttag}+{lasttagdistance}\n'
133765: release67_fixes_base+200
3.433u 0.662s 0:04.36 93.8%     0+0k 0+0io 0pf+0w

% time ./hg -R ../netbeans_release67_fixes log --template '{rev}: {lasttag}+{lasttagdistance}\n' > /dev/null
35.592u 1.319s 0:38.81 95.0%    0+0k 0+0io 0pf+0w

On demand:

% time ./hg parent --template '{rev}: {lasttag}+{lasttagdistance}\n' -R ../netbeans_release67_fixes
133765: release67_fixes_base+200
0.418u 0.577s 0:01.00 98.0%     0+0k 0+0io 0pf+0w

% time ./hg log --template '{rev}: {lasttag}+{lasttagdistance}\n' -R ../netbeans_release67_fixes > /dev/null
38.515u 1.394s 0:42.60 93.6%    0+0k 0+0io 0pf+0w

Reference:

% time ./hg parent --template '{rev}:\n' -R ../netbeans_release67_fixes
133765:
0.381u 0.610s 0:01.00 99.0%     0+0k 0+0io 0pf+0w

% time ./hg log --template '{rev}:\n' -R ../netbeans_release67_fixes > /dev/null
27.617u 1.221s 0:29.54 97.5%    0+0k 0+0io 0pf+0w

So it seems, we have a clear winner here, and that your experiment transformed
the try.

> A recursive on-demand implementation would be simpler, but allocating
> latest-tag-distance frames on the stack is not acceptable.
> 
> The on-demand method can probably be optimized by using pruning if we may
> assume that changeset timestamps are ever-increasing.
> 

I don't think this is a wise assumption. You are fast enough to stay safe.

Regards.
Gilles.