[PATCH 0 of 5] Implement O(n) algorithm for findrenames() [version 2]

Sun Mar 21 19:16:26 CDT 2010

Benoit Boissinot wrote, On 03/21/2010 10:01 PM:
> On Sun, Mar 21, 2010 at 9:07 PM, Mads Kiilerich<mads at kiilerich.com>  wrote:
>    
>> David Greenaway wrote, On 03/21/2010 08:33 AM:
>>      
>>> The algorithm used (inspired by Andrew Tridgell's "rsync" algorithm)
>>> splits the source files into k-byte blocks,
>>>        
>> "k-byte" refers to _BLOCK_SIZE = 64?
>>
>>      
>>> and places a 32-bit hash of
>>> each block into a dictionary. We then see if any of the target files
>>> contain any matching hashes from the source files. The more blocks
>>> a particular target file has in common with a particular source file,
>>> the more similar those files are considered. Full details of the
>>> algorithm are below.
>>>
>>>        
>> So if one extra character is inserted in the beginning of a file none of the
>> hashes will match? There is no rolling checksum?
>>      
> If it's rsync-like then it will match. But it won't match if every
> block is modified, that's the main flaw.
>    

Right - sorry, I was wrong. I had missed that the proposed 
implementation actually do use the rolling checksum when looking up in 
the dictionary - which was created un-rolling. Most of my comments are 
thus made on false premises and not relevant.

/Mads