Solving long paths by hashing

Sun Jun 29 13:25:43 CDT 2008

On 29.06.2008 19:47, Adrian Buehlmann wrote:
> On 29.06.2008 19:20, Dirkjan Ochtman wrote:
>> Adrian Buehlmann wrote:
>>> I assume this is because the hashed files will be stored in a separate
>>> tree in the store, right?
>> I think it's because reading a file is often quicker than walking 
>> directory trees. Instantiating some dict from that can be fast.
> 
> Sure. I knew that. But that's not the point.
> 
>>> So streamclone can exclude that part of the store when it does its walk.
>>> (Of course streamclone will still walk the tree of the non-hashed files,
>>> right?)
>> If we're using all-hashed file names, that might be a good idea. But I 
>> wonder if it wouldn't be possible to try something with unique prefixes.
> 
> Well. Currently streamclone *does* walk the whole tree under
> .hg/store/data.
> 
> I was thinking about separating the files with hashed names in a separate
> tree after reading Matt's response.
> 
> Potential future structure:
> 
> .hg/store/data     contains the tree of non-hashed filenames *only*
> .hg/store/hdat     contains tree with hashed-name files *only*
> 
> If you don't separate those trees (which I assumed first), then you don't
> gain anything in dir-walking speed on streamclone with that longnames file
> *if* streamclone then still walks into subtrees of .hg/store/data to just
> discover that some subdirs only contain hashed files -- which are already
> listed in the longnames file.

BTW you could get away with streamclone dir-walking entirely by having
a file, let's say '.hg/store/filenames' which contains *all* files -- not just
the ones with hashed names.

This would give you the maximum speed gain on streamclone for the whole tree. At
the expense of even more disk space and memory.