[PATCH] Experiment: On demand calculation of lasttag & distance instead of pre-calculation

Mads Kiilerich mads at kiilerich.com
Fri Aug 28 20:28:48 CDT 2009


# HG changeset patch
# User Gilles Moris <gilles.moris at free.fr>
# Date 1247087563 -7200
# Node ID 7e9e9b38720584274927cc0c0ebcbdf4ff502357
# Parent  7345fa5e572e029362bc6f2f96af452607ebb1ff
Experiment: On demand calculation of lasttag & distance instead of pre-calculation

This is an experiment of tweaking Gilles algorithm to get good performance in
more cases.

Computing lasttag for N tags with pre-calculation is O(N + repo-size), but for
practical purposes it can be considered O(repo-size). Pretty good for large N
but bad for N=1.

By using the mapping as a cache and populate it on demand we can make single
lookup O(longest-tag-distance) while keeping an upper bound for N lookups on
O(N + repo-size). For practical purposes with 1 << N << repo-size and
well-placed tags it can perhaps be considered O(N).

Some quick measurements on the Mercurial repo:

Pre-calculation:
[mk at localhost hg]$ time ./hg parent --template '{rev}: {lasttag}+{lasttagdistance}\n'
9406: 1.3.1+135

real	0m0.598s
user	0m0.357s
sys	0m0.068s

[mk at localhost hg]$ time ./hg log --template '{rev}: {lasttag}+{lasttagdistance}\n' > /dev/null

real	0m3.606s
user	0m3.099s
sys	0m0.112s

On demand:

[mk at localhost hg]$ time ./hg parent --template '{rev}: {lasttag}+{lasttagdistance}\n'
9406: 1.3.1+135

real	0m0.399s
user	0m0.149s
sys	0m0.071s

[mk at localhost hg]$ time ./hg log --template '{rev}: {lasttag}+{lasttagdistance}\n' > /dev/null

real	0m3.977s
user	0m3.414s
sys	0m0.125s

Reference:

[mk at localhost hg]$ time ./hg parent --template '{rev}:\n'
9406:

real	0m0.347s
user	0m0.089s
sys	0m0.080s

[mk at localhost hg]$ time ./hg log --template '{rev}:\n' > /dev/null

real	0m2.909s
user	0m2.434s
sys	0m0.117s

I assume that testing on bigger repos will show that "on demand" wins much more
significant for single lookups. "On demand" is probably slower for many
lookups, and it is probably possible to optimize it to be less slower, but how
much does that matter? How often will N be close to repo-size for big repos?

A recursive on-demand implementation would be simpler, but allocating
latest-tag-distance frames on the stack is not acceptable.

The on-demand method can probably be optimized by using pruning if we may
assume that changeset timestamps are ever-increasing.

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -743,6 +743,7 @@
                                          'parent': '{rev}:{node|formatnode} ',
                                          'manifest': '{rev}:{node|formatnode}',
                                          'filecopy': '{name} ({source})'})
+        self._latestrevtagcache = {-1: (0, 0, 'null')}
 
     def use_template(self, t):
         '''set template string to use'''
@@ -760,6 +761,27 @@
             return []
         return parents
 
+    def _latestrevtag(self, rev):
+        todo = [rev]
+        while todo:
+            rev = todo.pop()
+            if rev in self._latestrevtagcache:
+                continue
+            ctx = self.repo[rev]
+            tags = [t for t in ctx.tags() if self.repo.tagtype(t) == 'global']
+            if tags:
+                self._latestrevtagcache[rev] = ctx.date()[0], 0, ':'.join(sorted(tags))
+                continue
+            prevs = [p.rev() for p in ctx.parents()]
+            try:
+                pdate, pdist, ptag = max(self._latestrevtagcache[prev] for prev in prevs)
+            except KeyError:
+                todo.append(rev)
+                todo.extend(prevs)
+                continue
+            self._latestrevtagcache[rev] = pdate, pdist + 1, ptag
+        return self._latestrevtagcache[rev]
+
     def _show(self, ctx, copies, props):
         '''show a single changeset or file revision'''
 
@@ -877,6 +899,11 @@
                 removes += i[2]
             return '%s: +%s/-%s' % (files, adds, removes)
 
+        def showlasttag(**args):
+            return self._latestrevtag(ctx.rev())[2]
+        def showlasttagdistance(**args):
+            return self._latestrevtag(ctx.rev())[1]
+
         defprops = {
             'author': ctx.user(),
             'branches': showbranches,
@@ -894,6 +921,8 @@
             'tags': showtags,
             'extras': showextras,
             'diffstat': showdiffstat,
+            'lasttag': showlasttag,
+            'lasttagdistance': showlasttagdistance,
             }
         props = props.copy()
         props.update(defprops)
diff --git a/mercurial/help.py b/mercurial/help.py
--- a/mercurial/help.py
+++ b/mercurial/help.py
@@ -393,6 +393,9 @@
                 number.
     :tags:      List of strings. Any tags associated with the
                 changeset.
+    :lasttag:   String. Most recent global tag in the ancestors of this
+                changeset.
+    :lasttagdistance: Integer. Longest path to the last tag.
 
     The "date" keyword does not produce human-readable output. If you
     want to use a date in your output, you can use a filter to process
diff --git a/tests/test-command-template b/tests/test-command-template
--- a/tests/test-command-template
+++ b/tests/test-command-template
@@ -127,4 +127,49 @@
 echo 'x = "f' >> t
 hg log
 
+cd ..
+
+echo '# lasttag'
+hg init lasttag
+cd lasttag
+
+echo a > file
+hg ci -Am a -d '0 0'
+
+echo b >> file
+hg ci -m b -d '1 0'
+
+echo c >> head1
+hg ci -Am h1c -d '2 0'
+
+hg update -q 1
+echo d >> head2
+hg ci -Am h2d -d '3 0'
+
+echo e >> head2
+hg ci -m h2e -d '4 0'
+
+hg merge -q
+hg ci -m merge -d '5 0'
+
+echo '# No tag set'
+hg log --template '{rev}: {lasttag}+{lasttagdistance}\n'
+
+echo '# one common tag: longuest path wins'
+hg tag -r 1 -m t1 -d '6 0' t1
+hg log --template '{rev}: {lasttag}+{lasttagdistance}\n'
+
+echo '# one ancestor tag: more recent wins'
+hg tag -r 2 -m t2 -d '7 0' t2
+hg log --template '{rev}: {lasttag}+{lasttagdistance}\n'
+
+echo '# two branch tags: more recent wins'
+hg tag -r 3 -m t3 -d '8 0' t3
+hg log --template '{rev}: {lasttag}+{lasttagdistance}\n'
+
+echo '# merged tag overrides'
+hg tag -r 5 -m t5 -d '9 0' t5
+hg tag -r 3 -m at3 -d '10 0' at3
+hg log --template '{rev}: {lasttag}+{lasttagdistance}\n'
+
 echo '# done'
diff --git a/tests/test-command-template.out b/tests/test-command-template.out
--- a/tests/test-command-template.out
+++ b/tests/test-command-template.out
@@ -672,4 +672,55 @@
 1e4e1b8f71e05681d422154f5421e385fec3454f
 # error on syntax
 abort: t:3: unmatched quotes
+# lasttag
+adding file
+adding head1
+adding head2
+created new head
+# No tag set
+5: null+5
+4: null+4
+3: null+3
+2: null+3
+1: null+2
+0: null+1
+# one common tag: longuest path wins
+6: t1+4
+5: t1+3
+4: t1+2
+3: t1+1
+2: t1+1
+1: t1+0
+0: null+1
+# one ancestor tag: more recent wins
+7: t2+3
+6: t2+2
+5: t2+1
+4: t1+2
+3: t1+1
+2: t2+0
+1: t1+0
+0: null+1
+# two branch tags: more recent wins
+8: t3+5
+7: t3+4
+6: t3+3
+5: t3+2
+4: t3+1
+3: t3+0
+2: t2+0
+1: t1+0
+0: null+1
+# merged tag overrides
+10: t5+5
+9: t5+4
+8: t5+3
+7: t5+2
+6: t5+1
+5: t5+0
+4: at3:t3+1
+3: at3:t3+0
+2: t2+0
+1: t1+0
+0: null+1
 # done


More information about the Mercurial-devel mailing list