[PATCH] auto rename: best matches and speed improvements UPDATE3 - Matts + Petr's findings added

Petr Kodl petrkodl at gmail.com
Thu Oct 2 18:26:29 CDT 2008


>
> >
> > Uh oh. Why did we grow more C code?
>
> Histogram and the Levenshtein distance can only be done
> in C efficiently. Histograms are used to pre-compare
> the files and get an upper-bound for the score. The
> Levenshtein distance is needed for the name matching
> when moving identical files (we discussed this couple
> of weeks ago).
>

The levenshtein and reverse string search do not make that much difference -
they only operate on short filename strings, but histogram related functions
have to be in C - I tried to reimplement the C functions in Python just for
the fun of it - the speed degrades too much - serveral X - for just 632
renames and mostly source code files.

attached is the haddremove.py where you can switch between C and native
python version - just change the if 1: on top - I do not think Python can do
it much more efficiently than this

pk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://selenic.com/pipermail/mercurial-devel/attachments/20081002/c99cc4c2/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: haddremove.py
Type: text/x-python
Size: 13894 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20081002/c99cc4c2/attachment.py 


More information about the Mercurial-devel mailing list