Solving long paths by hashing

Adrian Buehlmann adrian at cadifra.com
Sun Jun 29 04:43:52 CDT 2008


On 27.06.2008 03:13, Jesse Glick wrote:
> Peter Arrenbrecht wrote:
>> Proposal:
>>
>> [...] fall back on a hashing scheme. [...]
>>
>> def encode(path):
>> 	ubar = ubarencode(path)
>> 	if len(ubar) > repo.maxstorepathlen:
>> 		return hashencode(ubar)
>> 	return ubar
>>
>> def hashencode(path):
>> 	return 'hashed/' + path[:10] + md5.md5(path).hexdigest()
>
> You need to do some more work, since currently serving the repo tries to
> iterate over storage files and this assumes a reversible encoding.
>
> For a working patch to play with:
>
> http://www.selenic.com/mercurial/bts/file520/prevent-excessively-long-repo-paths.diff
>
> Doesn't apply cleanly against current sources, but probably easy enough
> to fix up.

Thanks Jesse.

I've just had a closer look into "the repo tries to iterate over storage
files and this assumes a reversible encoding".

Of course, Jesse is right.

Decoding of path names is currently done by util.decodefilename

> grep -n decodefilename *.py
localrepo.py:66:            self.decodefn = util.decodefilename
statichttprepo.py:59:            self.decodefn = util.decodefilename
util.py:1374:encodefilename, decodefilename = _buildencodefun()

which goes into "repo".decodefn

> grep -n decodefn *.py
localrepo.py:66:            self.decodefn = util.decodefilename
localrepo.py:70:            self.decodefn = lambda x: x
statichttprepo.py:59:            self.decodefn = util.decodefilename
statichttprepo.py:63:            self.decodefn = lambda x: x
streamclone.py:78:            name = repo.decodefn(util.pconvert(name))

"repo".decodefn is used by streamclone.stream_out. Excerpt from
streamclone.py, line 77:

'''
        for name, size in walkrepo(repo.spath):
            name = repo.decodefn(util.pconvert(name))
            entries.append((name, size))
            total_bytes += size
'''

streamclone.stream_out is used by sshserver:

> grep -n streamclone *.py
sshserver.py:11:import os, streamclone, sys, tempfile, util, hook
sshserver.py:207:        streamclone.stream_out(self.repo, self.fout)
streamclone.py:1:# streamclone.py - streaming clone server support for mercurial


sshserver depends on the path encoding being reversible.

sshserver is used by commands.serve if --stdio is specified, which is used
for remote clients doing e.g. pull or push over ssh.

Questions left:
Does streamclone really need to walk the store like that?
Would it be possible to eliminate this use of util.decodefilename?



More information about the Mercurial-devel mailing list