[PATCH 0 of 5] RFC: Implement O(n + m) algorithm for findrenames()

David Greenaway hg-dev at davidgreenaway.com
Sat Mar 6 23:30:16 CST 2010


On Sat, Mar 06, 2010 at 11:03:57PM -0600, Augie Fackler wrote:
> OOC, have you seen http://bitbucket.org/cbarrett/guess-renames/?
> Specifically, Aaron Bentley's algorithm for this?
> http://osdir.com/ml/bazaar/2009-03/msg01300.html

Thanks for the link. I wasn't aware that Bazaar had similar
functionality. The closest thing I found was:

    http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf

which seemed more complex than necessary for finding renames.

Bentley's algorithm appears to use pairs of lines instead of k-byte
blocks for finding potential matches. Additionally, when scoring
potential matches, Bentley's algorithm gives less weight to common
blocks over less common blocks.

It would be interesting to compare the performance / accuracy of the two
algorithms. The latter feature of Bentley's algorithm would likely help
with its accuracy.

Cheers,
David




More information about the Mercurial-devel mailing list