[PATCH] auto rename: best matches and speed improvements PLEASE REVIEW

Herbert Griebel herbertg at gmx.at
Sat Sep 27 18:50:11 CDT 2008


# HG changeset patch
# User Herbert Griebel <herbertg at gmx.at>
# Date 1222555608 -7200
# Node ID d5cab1d3339f14d07723944493f88c9b28b49e28
# Parent  f29b674cc2210126c2899d94d882c367a8ea64bc
auto rename: best matches and speed improvements

I fixed the problem with file diff score, and did some
further speed improvements, is roughly 5 times faster
than the previous version. Explanations of the algorithm
are now in the code.

Some performance figures:

Command used: hg addremove -n -sX
Measurements are done with disk cash to get stable times.

Working dir:
1055 removed, 1234 added, lots of changes in about 30% of the files,
roughly 80% moves and 20% renames

similarity   time
1%           82.0s
50%          41.5s
60%          36.6s
80%          26.4s
90%          21.3s
100%         12.0s
The old version took 59min, independent of the similarity.


I have spent quite some time on this algorithm and finally
did everything that bothered me. This is the final version
and I will do only work necessary for the integration.

I also wanted to change the output and add some additional
command line options to give the user more control over
the rename, but dropped these other patches. So it is only
one patch now which only changes the findrenames function,
adding some fast C code for it, and updates the test.

If someone could review the code so that it can be integrated
this would be great.


Herbert



More information about the Mercurial-devel mailing list