[PATCH 0 of 5] Implement O(n) algorithm for findrenames() [version 2]

Sun Mar 21 16:01:30 CDT 2010

On Sun, Mar 21, 2010 at 9:07 PM, Mads Kiilerich <mads at kiilerich.com> wrote:
> David Greenaway wrote, On 03/21/2010 08:33 AM:
>> The algorithm used (inspired by Andrew Tridgell's "rsync" algorithm)
>> splits the source files into k-byte blocks,
>
> "k-byte" refers to _BLOCK_SIZE = 64?
>
>> and places a 32-bit hash of
>> each block into a dictionary. We then see if any of the target files
>> contain any matching hashes from the source files. The more blocks
>> a particular target file has in common with a particular source file,
>> the more similar those files are considered. Full details of the
>> algorithm are below.
>>
>
> So if one extra character is inserted in the beginning of a file none of the
> hashes will match? There is no rolling checksum?

If it's rsync-like then it will match. But it won't match if every
block is modified, that's the main flaw.

Benoit