Efficiently comparing manifests

Greg Ward greg at gerg.ca
Thu May 13 13:44:40 CDT 2010


API question: how do I *efficiently* compare manifests from related
changesets?  Specifically, I am writing a script to detect all dummy
merges in our repository, i.e. changesets where p2 != null and
p1.manifest == self.manifest.  I have to compare the manifest
contents, not just the manifest node ID, because Mercurial creates a
new manifest entry even when there are no diffs relative to the first
parent.

My first attempt was this:

def isdummymerge(ui, repo, rev):
    """Return true if rev is a dummy merge, i.e. has no delta relative to
    its first parent."""
    cctx = repo[rev]
    parent = cctx.p1().manifest()
    manifest = cctx.manifest()
    return parent == manifest

(I've already established that rev refers to a merge.)  But that
doesn't work because changectx.manifest() only includes file names and
node IDs; if two changesets differ only in permissions changes, then
this will incorrectly conclude that the manifests are equal.  It's
also very slow; each entry in our manifest log is about 1.8 MB (18,000
files).

So I peeled a layer off the onion and went straight to the manifest class:

    cl = repo.changelog
    ml = repo.manifest

    node = cl.node(rev)
    p1 = cl.parents(node)[0]

    mymft = ml.revision(cl.read(node)[0])
    p1mft = ml.revision(cl.read(p1)[0])
    return mymft == p1mft

This seems to work, but it's still awfully slow.  (It should be a bit
faster since I'm not parsing that 1.8 MB string into an 18,000-entry
dict, but I don't think that's the bottleneck.)  ISTR someone
(Benoit?) advising use of changectx.manifestdelta() (or
manifest.readdelta()) for stuff like this, but I don't see how that
can work: it appears to me that manifest.readdelta() just returns the
diff between rev-1 and rev.  That's no good: I need the diff between
parent1(node) and node.

Suggestions?

Greg


More information about the Mercurial-devel mailing list