[PATCH] auto rename: best matches and speed improvement UPDATE4

Bill Sommerfeld sommerfeld at sun.com
Sat Aug 16 08:36:07 CDT 2008


On Sat, 2008-08-16 at 11:24 +0200, Herbert Griebel wrote:
>  - compare the crc: maybe from the repo you get it for free, for
>    the file you have to calc it. There may be something like very
>    efficient "CRC similarity" measures or so. 

I very much doubt it.  Functions intended to detect small modifications
to data in transit work best if they behave like pseudo-random functions
-- in other words, if you change one input bit, on average half of the
output bits should change.  You can't get a very good distance metric
out of this.

>    which I found: the rsync on UNIX has a very
>    efficient comparision algorithm using "rolling" checksums:
> http://www.itworld.com/unix-shuffle-file-systems-rsynch-nlsunix-080116?page=0%2C1

rsync is solving a very different problem: trying to minimize the amount
of communication between two processes which may be connected by a small
pipe (and possibly burning a lot of cpu to compensate).  In this case
one process can see both file trees; as with the recent change to binary
file compares, it's likely more efficient to just compare the files
directly rather than to compute rolling checksums and then compare the
checksums.

						- Bill





More information about the Mercurial-devel mailing list