Quick way to find added/modified/removed across revisions?

Jesper Noehr jnoehr at gmail.com
Tue Jul 1 12:49:10 CDT 2008


On Jul 1, 2008, at 7:24 PM, Matt Mackall wrote:
>
> On Tue, 2008-07-01 at 17:42 +0200, Jesper Noehr wrote:
>> On Jul 1, 2008, at 4:53 PM, Matt Mackall wrote:
>>>
>>> On Tue, 2008-07-01 at 15:03 +0200, Jesper Noehr wrote:
>>>> Hi list,
>>>>
>>>> I'm trying to sort out a quick way to figure out how many files  
>>>> were
>>>> added/modified/removed across a change. I want to display these
>>>> numbers on the shortlog page together with the date, description  
>>>> and
>>>> author. First try was calling repo.status(), which is a very heavy
>>>> operation. Next, I tried to read the manifest of the cset, together
>>>> with the manifest of the parent, and do some cheap comparison on
>>>> those. This turned out also to be extremely expensive (20 seconds  
>>>> for
>>>> 25 revisions!)
>>>
>>> [...]
>>>
>>>> My profiler tells me that most of the time is spent in
>>>> zlib.decompress
>>>> (to read the manifest from a compressed file, I guess), and there's
>>>> also a lot of load on revlog.py:chunk and manifest.py:parse.

[...]

>> .readdelta() sure is promising, much faster than reading the entire
>> manifest. The problem I'm having now is how to read the results of  
>> it.
>> For example, say that I just grab one rev and these are the results:
>>
>> delta: {'django/trunk/django/db/models/sql/query.py': '\x98E
>> \x90\xd8\xaa\xe6\xfc"\x98P\xc4\t\xd7\x15\xab\xb1Ecr\xcb'}
>> files: ['django/trunk/django/db/models/sql/query.py']
>>
>> How do I figure out what happened to the file in this instance? Sorry
>> if I'm asking a stupid question here.
>
> Well that tells us that query.py was either modified or added. To know
> which, we have to know whether query was in the parent:
>
> for f in files:
>   if f not in delta:
>       removed.append(f)
>       del manifest[f]
>   elif f not in manifest:
>       added.append(f)
>       manifest[f] = delta[f]	
>   else:
>       modified.append(f)
> 	    manifest[f] = delta[f]


Well, what manifest are you talking about here? The full manifest of  
the parent? In that case, it's necessary to extract and parse the full  
manifest of several changesets, as far as I can understand, and the  
code is back to being slow. I was hoping there was a way to do this  
without parsing the entire manifest, imagine a use case of looking up  
the added/modified/removed count for 25 changesets.


Jesper



More information about the Mercurial-devel mailing list