Hg diffs are not minimal

Matt Mackall mpm at selenic.com
Wed Aug 10 20:58:36 CDT 2005


On Wed, Aug 10, 2005 at 07:20:31PM -0400, Chuck Ebbert wrote:
> 
>   I used 'hg diff' to generate a diff between linux 2.6.13-rc5 and -rc6.
> There were 32 extra inserts and 32 extra deletes vs. a patch generated by
> diffing the source trees.  Looks like mercurial patches sometimes add and
> remove the exact same line.

Yes.

There's some rather long discussions in the archives about diff
algorithms.

First, let's note that 'diff' diffs are not necessarily minimal
either. That algorithm assumes that all edits (copy, insert, delete)
are the same cost. And as the algorithm has rather expensive
worst-case behavior, 'diff' will take shortcuts fairly regularly.

The Mercurial diff algorithm (based on Python's difflib) works by
finding the largest unchanged block in the file, then finding the
largest unchanged block on either side, and so on.

For purposes of matching, it ignores lines that are often repeated.
Not only does it make diff faster, but it less often gets "confused"
and out of sync by braces from other functions.

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial mailing list