Solving long paths by hashing

Sun Jun 29 06:57:04 CDT 2008

On 29.06.2008 12:28, Dirkjan Ochtman wrote:
> Adrian Buehlmann wrote:
>> Questions left:
>> Does streamclone really need to walk the store like that?
>> Would it be possible to eliminate this use of util.decodefilename?
> 
> Well, I think what it does is walking all the files and passing their 
> name, size and contents to the client so that the client can just save 
> the revlog contents under the appropriate file name, using the encoding 
> that the client hg prefers, so there's no way around that, really.
> 
> The alternative is discovering all filenames in some other way than 
> walking the store, but it seems that would involve either walking 
> manifests for all changesets in the changelog or reading each filelog, 
> checking out the manifest in which it last appeared, then read that 
> manifest to find any files that are still missing, or something. Both of 
> these aren't going to be very efficient, it seems.

Thanks Dirkjan.

Just got another idea:

Instead of writing a reverse mapping of encoded -> unencoded filenames
into a single file as Jesse's patch does (the "longnames" file), we could:

Prepend a new prefix to the content of every name-hashed *.i file in the
store, consisting of

a) a new revlog-header
b) followed by the unencoded filename
c) followed by some limiter

and then followed by whatever *.i files currently contain (somewhat similar
to adding another layer to a protocol).

We could then read the decoded filename from the beginning of the *.i file
and skip the new prefix.

After all, we create a new repo layout anyway, so we can change
the way we store *.i files.

For example, streamclone.stream_out reads the *.i files anyway, so it
would be efficient for stream_out to extract the unencoded filename
from the *.i file it is about the send.