Quick way to find added/modified/removed across revisions?
Jesper Noehr
jesper at noehr.org
Tue Jul 1 08:03:37 CDT 2008
Hi list,
I'm trying to sort out a quick way to figure out how many files were
added/modified/removed across a change. I want to display these
numbers on the shortlog page together with the date, description and
author. First try was calling repo.status(), which is a very heavy
operation. Next, I tried to read the manifest of the cset, together
with the manifest of the parent, and do some cheap comparison on
those. This turned out also to be extremely expensive (20 seconds for
25 revisions!)
Here's my code:
from mercurial import hg, node, ui
repo_path = '/path/to/a/large/repo'
if __name__ == "__main__":
u = ui.ui()
r = hg.repository(u, repo_path)
cl = r.changelog
tip = r.changectx('tip')
ctx_range = xrange(cl.count() - 25, tip.rev())
for cnum in ctx_range:
ctx = r.changectx(cnum)
parent = ctx.parents()[0]
this_manifest = ctx.manifest()
parent_manifest = parent.manifest()
added, modified, removed = 0, 0, 0
for file_ in ctx.files():
if this_manifest.has_key(file_) and
parent_manifest.has_key(file_):
modified += 1
elif this_manifest.has_key(file_) and not
parent_manifest.has_key(file_):
added += 1
else:
removed += 1
short = node.short(ctx.node())
print "in cset %s: added=%d, modified=%d, removed=%d" %
(short, added, modified, removed)
My profiler tells me that most of the time is spent in zlib.decompress
(to read the manifest from a compressed file, I guess), and there's
also a lot of load on revlog.py:chunk and manifest.py:parse.
Is there any other way to determine these numbers without having to
extract and parse the entire manifest per revision?
Jesper
More information about the Mercurial-devel
mailing list