Remark on store.encodedir and lowerencode for hashed paths
Adrian Buehlmann
adrian at cadifra.com
Fri Oct 5 06:50:45 CDT 2012
The code for hybridencode before we did the recent series of speed refactorings
was (revision 65df60a3f96b):
_maxstorepathlen = 120
_dirprefixlen = 8
_maxshortdirslen = 8 * (_dirprefixlen + 1) - 4
def _hybridencode(path, auxencode):
if not path.startswith('data/'):
return path
# escape directories ending with .i and .d
path = encodedir(path)
ndpath = path[len('data/'):]
res = 'data/' + auxencode(encodefilename(ndpath))
if len(res) > _maxstorepathlen:
digest = _sha(path).hexdigest()
aep = auxencode(lowerencode(ndpath))
_root, ext = os.path.splitext(aep)
parts = aep.split('/')
basename = parts[-1]
sdirs = []
for p in parts[:-1]:
d = p[:_dirprefixlen]
if d[-1] in '. ':
# Windows can't access dirs ending in period or space
d = d[:-1] + '_'
t = '/'.join(sdirs) + '/' + d
if len(t) > _maxshortdirslen:
break
sdirs.append(d)
dirs = '/'.join(sdirs)
if len(dirs) > 0:
dirs += '/'
res = 'dh/' + dirs + digest + ext
spaceleft = _maxstorepathlen - len(res)
if spaceleft > 0:
filler = basename[:spaceleft]
res = 'dh/' + dirs + filler + digest + ext
return res
For hashed paths, this does
path = encodedir(path)
ndpath = path[len('data/'):]
aep = auxencode(lowerencode(ndpath))
This "fails" to encode directories ending in
.I, .D, .HG, .Hg, .hG
because encodedir() only replaces the *lowercase*
.i, .d, .hg
For example, a directory 'foo.I' will be lowerencoded to 'foo.i',
which encodedir() won't see, as that has already been done.
In practice, this doesn't matter, as hashed paths end with a fourty
character hash, followed by .i or .d., which is impossible to collide
with a directory name, as those can't be longer than 8.
See also http://selenic.com/repo/hg/rev/810387f59696, which moved the
encodedir step from filelog into store (2009-05-20).
More information about the Mercurial-devel
mailing list