[PATCH] auto rename: best matches and speed improvement UPDATE4
Bill Sommerfeld
sommerfeld at sun.com
Sat Aug 16 08:36:07 CDT 2008
On Sat, 2008-08-16 at 11:24 +0200, Herbert Griebel wrote:
> - compare the crc: maybe from the repo you get it for free, for
> the file you have to calc it. There may be something like very
> efficient "CRC similarity" measures or so.
I very much doubt it. Functions intended to detect small modifications
to data in transit work best if they behave like pseudo-random functions
-- in other words, if you change one input bit, on average half of the
output bits should change. You can't get a very good distance metric
out of this.
> which I found: the rsync on UNIX has a very
> efficient comparision algorithm using "rolling" checksums:
> http://www.itworld.com/unix-shuffle-file-systems-rsynch-nlsunix-080116?page=0%2C1
rsync is solving a very different problem: trying to minimize the amount
of communication between two processes which may be connected by a small
pipe (and possibly burning a lot of cpu to compensate). In this case
one process can see both file trees; as with the recent change to binary
file compares, it's likely more efficient to just compare the files
directly rather than to compute rolling checksums and then compare the
checksums.
- Bill
More information about the Mercurial-devel
mailing list