[PATCH] auto rename: best matches and speed improvement UPDATE4

Sat Aug 16 11:57:05 CDT 2008

On Sat, 2008-08-16 at 11:24 +0200, Herbert Griebel wrote:
> Matt Mackall wrote:
> > On Sat, 2008-08-16 at 02:16 +0200, Herbert Griebel wrote:
> >> Matt Mackall wrote:
> >>> Thanks for looking into this, Herbert.
> >>>
> >>> First off, this will want a test case.
> >> ok, this will take me some time, since I have no Linux currently.
> >>
> >>> Second, I get the impression that the time complexity here is going from
> >>> O(n) to O(n**2), is that right?
> >> The complexity is the same, only all matches are now checked against each other.
> > 
> > If I do 1000 renames, I end up doing 1000**2 comparisons, right? This
> > was probably already the case with the original algorithm, but it's
> > still a worry.
> Yes, it will always be O(n**2), you have to compare each files with all
> other files. The only thing you can do is to make the comparison of the
> files more efficient. For example:
> 
>  - if the file sizes differ more than the similarity threshold,
>    don't even read the files.

This can be extended so that files of similar sizes are compared first.
And if we have better than xx% match, we needn't compare files more than
xx% different in size.

>  - take file pathnames into account:
>    - a *.cpp file will never get a *.bmp file
>    - it is unlikely that a binary file will get an ascii file

We could try to compare things with similar names or matching extensions
first. But Mercurial intentionally knows as little as possible about the
meaning of file names and whether or not things are 'binary'.

> I have problems using the -X option in addremove for multiple files,
> or globing. Are there some examples?, I cannot get it to work. How to
> exclude two files?

$ touch a b c
$ hg add -X b -X c
adding a

-- 
Mathematics is the supreme nostalgia of our time.