Quick way to find added/modified/removed across revisions?

Matt Mackall mpm at selenic.com
Tue Jul 1 09:53:45 CDT 2008


On Tue, 2008-07-01 at 15:03 +0200, Jesper Noehr wrote:
> Hi list,
> 
> I'm trying to sort out a quick way to figure out how many files were  
> added/modified/removed across a change. I want to display these  
> numbers on the shortlog page together with the date, description and  
> author. First try was calling repo.status(), which is a very heavy  
> operation. Next, I tried to read the manifest of the cset, together  
> with the manifest of the parent, and do some cheap comparison on  
> those. This turned out also to be extremely expensive (20 seconds for  
> 25 revisions!)

[...]

> My profiler tells me that most of the time is spent in zlib.decompress  
> (to read the manifest from a compressed file, I guess), and there's  
> also a lot of load on revlog.py:chunk and manifest.py:parse.

As you seem to be reading the manifest in the forward direction, it
should be caching most of this operation. I would expect the mpatch code
to show up most prominently.

In my tests (long ago), manifests tend to be large files with many small
changes. So the deltas end up being very small and quick to decompress,
while moving data around to apply the deltas tends to dominate. For
example, if we make 1000 single file changes to a 1M delta, we've got to
do at least 1G of memcpy to reconstruct them all, but probably less than
1M of uncompress.

The verify command has a trick to avoid reconstructing the entire
manifest which should speed things up: reading only the delta text. Each
changeset has a files field which shows all files changed. Compare that
with the data from manifest.readdelta() and you should be able to figure
out which ones were added (new entry), modified (changed entry), or
removed (listed in changeset but not in manifest delta).

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list