Quick way to find added/modified/removed across revisions?

Matt Mackall mpm at selenic.com
Tue Jul 1 14:26:15 CDT 2008


On Tue, 2008-07-01 at 19:49 +0200, Jesper Noehr wrote:
> On Jul 1, 2008, at 7:24 PM, Matt Mackall wrote:
> >
> > On Tue, 2008-07-01 at 17:42 +0200, Jesper Noehr wrote:
> >> On Jul 1, 2008, at 4:53 PM, Matt Mackall wrote:
> >>>
> >>> On Tue, 2008-07-01 at 15:03 +0200, Jesper Noehr wrote:
> >>>> Hi list,
> >>>>
> >>>> I'm trying to sort out a quick way to figure out how many files  
> >>>> were
> >>>> added/modified/removed across a change. I want to display these
> >>>> numbers on the shortlog page together with the date, description  
> >>>> and
> >>>> author. First try was calling repo.status(), which is a very heavy
> >>>> operation. Next, I tried to read the manifest of the cset, together
> >>>> with the manifest of the parent, and do some cheap comparison on
> >>>> those. This turned out also to be extremely expensive (20 seconds  
> >>>> for
> >>>> 25 revisions!)
> >>>
> >>> [...]
> >>>
> >>>> My profiler tells me that most of the time is spent in
> >>>> zlib.decompress
> >>>> (to read the manifest from a compressed file, I guess), and there's
> >>>> also a lot of load on revlog.py:chunk and manifest.py:parse.
> 
> [...]
> 
> >> .readdelta() sure is promising, much faster than reading the entire
> >> manifest. The problem I'm having now is how to read the results of  
> >> it.
> >> For example, say that I just grab one rev and these are the results:
> >>
> >> delta: {'django/trunk/django/db/models/sql/query.py': '\x98E
> >> \x90\xd8\xaa\xe6\xfc"\x98P\xc4\t\xd7\x15\xab\xb1Ecr\xcb'}
> >> files: ['django/trunk/django/db/models/sql/query.py']
> >>
> >> How do I figure out what happened to the file in this instance? Sorry
> >> if I'm asking a stupid question here.
> >
> > Well that tells us that query.py was either modified or added. To know
> > which, we have to know whether query was in the parent:
> >
> > for f in files:
> >   if f not in delta:
> >       removed.append(f)
> >       del manifest[f]
> >   elif f not in manifest:
> >       added.append(f)
> >       manifest[f] = delta[f]	
> >   else:
> >       modified.append(f)
> > 	    manifest[f] = delta[f]
> 
> 
> Well, what manifest are you talking about here? The full manifest of  
> the parent? In that case, it's necessary to extract and parse the full  
> manifest of several changesets, as far as I can understand, and the  
> code is back to being slow. I was hoping there was a way to do this  
> without parsing the entire manifest, imagine a use case of looking up  
> the added/modified/removed count for 25 changesets.

If those changesets are all linearly related, you'll only need to unpack
the first one. Note how the above code updates the manifest as it goes.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list