Quick way to find added/modified/removed across revisions?
Matt Mackall
mpm at selenic.com
Tue Jul 1 12:24:13 CDT 2008
On Tue, 2008-07-01 at 17:42 +0200, Jesper Noehr wrote:
> On Jul 1, 2008, at 4:53 PM, Matt Mackall wrote:
> >
> > On Tue, 2008-07-01 at 15:03 +0200, Jesper Noehr wrote:
> >> Hi list,
> >>
> >> I'm trying to sort out a quick way to figure out how many files were
> >> added/modified/removed across a change. I want to display these
> >> numbers on the shortlog page together with the date, description and
> >> author. First try was calling repo.status(), which is a very heavy
> >> operation. Next, I tried to read the manifest of the cset, together
> >> with the manifest of the parent, and do some cheap comparison on
> >> those. This turned out also to be extremely expensive (20 seconds for
> >> 25 revisions!)
> >
> > [...]
> >
> >> My profiler tells me that most of the time is spent in
> >> zlib.decompress
> >> (to read the manifest from a compressed file, I guess), and there's
> >> also a lot of load on revlog.py:chunk and manifest.py:parse.
> >
> > As you seem to be reading the manifest in the forward direction, it
> > should be caching most of this operation. I would expect the mpatch
> > code
> > to show up most prominently.
> >
> > In my tests (long ago), manifests tend to be large files with many
> > small
> > changes. So the deltas end up being very small and quick to
> > decompress,
> > while moving data around to apply the deltas tends to dominate. For
> > example, if we make 1000 single file changes to a 1M delta, we've
> > got to
> > do at least 1G of memcpy to reconstruct them all, but probably less
> > than
> > 1M of uncompress.
> >
> > The verify command has a trick to avoid reconstructing the entire
> > manifest which should speed things up: reading only the delta text.
> > Each
> > changeset has a files field which shows all files changed. Compare
> > that
> > with the data from manifest.readdelta() and you should be able to
> > figure
> > out which ones were added (new entry), modified (changed entry), or
> > removed (listed in changeset but not in manifest delta).
>
> .readdelta() sure is promising, much faster than reading the entire
> manifest. The problem I'm having now is how to read the results of it.
> For example, say that I just grab one rev and these are the results:
>
> delta: {'django/trunk/django/db/models/sql/query.py': '\x98E
> \x90\xd8\xaa\xe6\xfc"\x98P\xc4\t\xd7\x15\xab\xb1Ecr\xcb'}
> files: ['django/trunk/django/db/models/sql/query.py']
>
> How do I figure out what happened to the file in this instance? Sorry
> if I'm asking a stupid question here.
Well that tells us that query.py was either modified or added. To know
which, we have to know whether query was in the parent:
for f in files:
if f not in delta:
removed.append(f)
del manifest[f]
elif f not in manifest:
added.append(f)
manifest[f] = delta[f]
else:
modified.append(f)
manifest[f] = delta[f]
>
> Jesper
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list