RFC: version (big) file snapshots with storage outside a Mercurial repo with snap

Adrian Buehlmann adrian at cadifra.com
Tue Aug 17 14:26:26 CDT 2010



On 17.08.2010 20:52, Klaus Koch wrote:
> 
> On Aug 17, 2010, at 12:43 AM, Adrian Buehlmann wrote:
> 
>> On 15.08.2010 21:03, Klaus Koch wrote:
>> [snip]
>>> The repository and bug tracking of snap can be found at:
>>>
>>> http://bitbucket.org/kuk42/hgsnap/wiki/Home
>>
>> Wow. ~4000 lines of Python code!
>>
>> First and fast thought I had: As a user, I wouldn't want to have to depend on that
>> much extension code for storing my crown jewels in mercurial repos (let alone maintain
>> that code...).
> Oh please no FUD :)  Well, if you look beyond the boilerplate code for all the wrapping, most of the functionality could be implemented by small patches of Mercurial's functions.  However, one would most likely first see how this works out as extension, if only regarding the workflow and general approach.  I wrote some additional commands for the bigfiles extension, this resulted in ~2000 lines of Python code, never really worked and was a maintenance nightmare.

FUD? We will see :)

>> As an aside: Your hybridencode function looks questionable to me:
>>
>> def hybridencode(path):
>>    """stored snap file names get the same handling as Mercurial data files"""
>>    hpath = _hybridencode('data/'+path)
>>    if hpath.startswith('dh/'):
>>        return hpath[len('dh/'):]
>>    elif hpath.startswith('data/'):
>>        return hpath[len('data/'):]
>>    raise util.Abort(
>>        _("this Mercurial's hybridencode returns unexpected path encoding"))
>>
>> Mapping dh/ and data/ store paths into the same namespace may actually
>> produce filename collisions.
>>
>> Which is one of the reasons why I separated these into dh/ and data/ in
>> mercurial.store.hybridencode.
> Hm, I just wanted to save the additional path.  Still, no collision can happen.  First, to the original path is added as extension the sha1 of the content.  Since the extension is kept, only if the content were identical would we see the same sha1 and then it does not matter whether the names clash---we just save storage.  If the content were different, snap would add an integer to the sha1.
> 
> Nevertheless, I may remove that code.  It is not really needed and we save some lines of Python.

Keep in mind that path lengths on Windows are limited, which was one of
the reasons for why we did mercurial.store.hybridencode as it is -- in
case this extension is intended to be used on Windows.

So you might probably as well just use the sha-1 of the contents for the
filenames (like git's blobs). It's a good chance to do it before you
deploy this extension...



More information about the Mercurial-devel mailing list