[PATCH] auto rename: best matches and speed improvement UPDATE4

Fri Aug 15 19:16:31 CDT 2008

Matt Mackall wrote:
> On Sat, 2008-08-16 at 01:48 +0200, Herbert Griebel wrote:
>> # HG changeset patch
>> # User Herbert Griebel <herbertg at gmx.at>
>> # Date 1218843942 -7200
>> # Node ID 6cf73b7c3671f0737e25c72d45fa3ca805426df6
>> # Parent  438e02b4be73ffd80d4a6cc45b7c9273d6b26604
>> auto rename: best matches and speed improvement
>>
>> The auto rename feature of addremove did not rename to the best match.
>> The old implementation looped over "added files" and selected the best
>> match for each individually. The next "added file" could also have its
>> best match with the same removed file, but which is far lower than the
>> one before. Therefore the last "added file" wins which may not be the
>> best.
>> The new implementation fixes this. The rename list is also sorted in
>> reverse order to make it easier to find the wrong matches with low
>> similarity. For speed improvements the loops are dynamically
>> interchanged which can speed up the matching considerably.
>> There is also a very rough estimation of the matching time, and
>> output has been enhanced to give the user feedback about the
>> progress.
> 
> Thanks for looking into this, Herbert.
> 
> First off, this will want a test case.
ok, this will take me some time, since I have no Linux currently.

> 
> Second, I get the impression that the time complexity here is going from
> O(n) to O(n**2), is that right?

The complexity is the same, only all matches are now checked against each other.
For this there is an extra loop at the end of findrenames which is fast.
Speed has improved because the loop order is exchanged. There are also
no more multiple occurances in the list.

What really is lacking is the quality of the diff. I have files with almost no similarity
which get a score of almost 50%. I will take a look at that.