Solving long paths by hashing
Adrian Buehlmann
adrian at cadifra.com
Sun Jun 29 04:43:52 CDT 2008
On 27.06.2008 03:13, Jesse Glick wrote:
> Peter Arrenbrecht wrote:
>> Proposal:
>>
>> [...] fall back on a hashing scheme. [...]
>>
>> def encode(path):
>> ubar = ubarencode(path)
>> if len(ubar) > repo.maxstorepathlen:
>> return hashencode(ubar)
>> return ubar
>>
>> def hashencode(path):
>> return 'hashed/' + path[:10] + md5.md5(path).hexdigest()
>
> You need to do some more work, since currently serving the repo tries to
> iterate over storage files and this assumes a reversible encoding.
>
> For a working patch to play with:
>
> http://www.selenic.com/mercurial/bts/file520/prevent-excessively-long-repo-paths.diff
>
> Doesn't apply cleanly against current sources, but probably easy enough
> to fix up.
Thanks Jesse.
I've just had a closer look into "the repo tries to iterate over storage
files and this assumes a reversible encoding".
Of course, Jesse is right.
Decoding of path names is currently done by util.decodefilename
> grep -n decodefilename *.py
localrepo.py:66: self.decodefn = util.decodefilename
statichttprepo.py:59: self.decodefn = util.decodefilename
util.py:1374:encodefilename, decodefilename = _buildencodefun()
which goes into "repo".decodefn
> grep -n decodefn *.py
localrepo.py:66: self.decodefn = util.decodefilename
localrepo.py:70: self.decodefn = lambda x: x
statichttprepo.py:59: self.decodefn = util.decodefilename
statichttprepo.py:63: self.decodefn = lambda x: x
streamclone.py:78: name = repo.decodefn(util.pconvert(name))
"repo".decodefn is used by streamclone.stream_out. Excerpt from
streamclone.py, line 77:
'''
for name, size in walkrepo(repo.spath):
name = repo.decodefn(util.pconvert(name))
entries.append((name, size))
total_bytes += size
'''
streamclone.stream_out is used by sshserver:
> grep -n streamclone *.py
sshserver.py:11:import os, streamclone, sys, tempfile, util, hook
sshserver.py:207: streamclone.stream_out(self.repo, self.fout)
streamclone.py:1:# streamclone.py - streaming clone server support for mercurial
sshserver depends on the path encoding being reversible.
sshserver is used by commands.serve if --stdio is specified, which is used
for remote clients doing e.g. pull or push over ssh.
Questions left:
Does streamclone really need to walk the store like that?
Would it be possible to eliminate this use of util.decodefilename?
More information about the Mercurial-devel
mailing list