Quick way to find added/modified/removed across revisions?

Jesper Noehr jesper at noehr.org
Tue Jul 1 08:03:37 CDT 2008


Hi list,

I'm trying to sort out a quick way to figure out how many files were  
added/modified/removed across a change. I want to display these  
numbers on the shortlog page together with the date, description and  
author. First try was calling repo.status(), which is a very heavy  
operation. Next, I tried to read the manifest of the cset, together  
with the manifest of the parent, and do some cheap comparison on  
those. This turned out also to be extremely expensive (20 seconds for  
25 revisions!)

Here's my code:

from mercurial import hg, node, ui

repo_path = '/path/to/a/large/repo'

if __name__ == "__main__":
     u = ui.ui()
     r = hg.repository(u, repo_path)

     cl = r.changelog
     tip = r.changectx('tip')
     ctx_range = xrange(cl.count() - 25, tip.rev())

     for cnum in ctx_range:
         ctx = r.changectx(cnum)
         parent = ctx.parents()[0]

         this_manifest = ctx.manifest()
         parent_manifest = parent.manifest()

         added, modified, removed = 0, 0, 0

         for file_ in ctx.files():
             if this_manifest.has_key(file_) and  
parent_manifest.has_key(file_):
                 modified += 1
             elif this_manifest.has_key(file_) and not  
parent_manifest.has_key(file_):
                 added += 1
             else:
                 removed += 1

         short = node.short(ctx.node())
         print "in cset %s: added=%d, modified=%d, removed=%d" %  
(short, added, modified, removed)

My profiler tells me that most of the time is spent in zlib.decompress  
(to read the manifest from a compressed file, I guess), and there's  
also a lot of load on revlog.py:chunk and manifest.py:parse.

Is there any other way to determine these numbers without having to  
extract and parse the entire manifest per revision?


Jesper


More information about the Mercurial-devel mailing list