improved autorename of addremove

Tue Mar 31 10:12:10 CDT 2009

On Tue, Mar 31, 2009 at 8:54 AM, Herbert Griebel <herbertg at gmx.at> wrote:
> Hi,
>
> I uploaded all the code for the improved autorename feature on bitbucket.
> There are two branches:
>
> Python only, branch autorename:
> http://bitbucket.org/herb/hg/changeset/65d7c5c06e00/
>
> Fast C code, branch autorename_c_code:
> http://bitbucket.org/herb/hg/changeset/f545a7a70303/
>
>
> The code is the same, except for added comments, a minor fix,
> and an improved name matching algorithm.
>
> The name matching now is able to match a large set of equal files correctly
> if moved, example:
>
> All files a.txt and folders are moved to folder x,
> all files a.txt have the same content
> (moving all files back from x also works):
>
> removing a.txt
> removing a/a.txt
> removing a/a/a.txt
> removing a/b/a.txt
> removing b/a.txt
> removing b/a/a.txt
> removing c/a.txt
> removing c/a/a.txt
> removing c/a/a/a.txt
> removing c/a/a/b/a.txt
> removing c/a/a/b/c/a.txt
> removing c/a/b/a.txt
> adding x/a.txt
> adding x/a/a.txt
> adding x/a/a/a.txt
> adding x/a/b/a.txt
> adding x/b/a.txt
> adding x/b/a/a.txt
> adding x/c/a.txt
> adding x/c/a/a.txt
> adding x/c/a/a/a.txt
> adding x/c/a/a/b/a.txt
> adding x/c/a/a/b/c/a.txt
> adding x/c/a/b/a.txt
> recording removal of c\a\a\b\c\a.txt as rename to x\c\a\a\b\c\a.txt (100% similar)
> recording removal of c\a\a\b\a.txt as rename to x\c\a\a\b\a.txt (100% similar)
> recording removal of c\a\b\a.txt as rename to x\c\a\b\a.txt (100% similar)
> recording removal of c\a\a\a.txt as rename to x\c\a\a\a.txt (100% similar)
> recording removal of c\a\a.txt as rename to x\c\a\a.txt (100% similar)
> recording removal of b\a\a.txt as rename to x\b\a\a.txt (100% similar)
> recording removal of a\b\a.txt as rename to x\a\b\a.txt (100% similar)
> recording removal of a\a\a.txt as rename to x\a\a\a.txt (100% similar)
> recording removal of c\a.txt as rename to x\c\a.txt (100% similar)
> recording removal of b\a.txt as rename to x\b\a.txt (100% similar)
> recording removal of a\a.txt as rename to x\a\a.txt (100% similar)
> recording removal of a.txt as rename to x\a.txt (100% similar)
> Elapsed time: 00:00:00,33  (23:00:19,54 to 23:00:19,87)
>
>
> Again, the matching algorithm can only give most likely matches
> based on content and pathname of the file and cannot guess the
> user's intention. For example a->b has 90% matching, and
> c->d has also 90% matching. Then it is quite likely you want
> a->b and c->d, but it could be also vice versa, a->d and c->b
> for other reasons than name and content matching. That's also
> the reason why I thing the ultimate solution is not a command
> line tool but a nice GUI which lets you choose correct matches
> easily with the help of a good similarity matching.

I agree, that's why TortoiseHg has had this since 0.7.  'hgtk guess'

> I think the biggest potential for an improvement is in the name
> matching, speeding up the content matching is next, getting better
> statistics to avoid byte by byte compares is very hard.
>
> If some of the comments or explanations are confusing, please let me know.
> Any comments/fixes/improvements/patches are welcomed and appreciated!

--
Steve