[PATCH RFC] store: _hashencode2 escapes names more simply

Adrian Buehlmann adrian at cadifra.com
Tue Sep 25 16:59:42 CDT 2012


On 2012-09-25 20:00, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1348596048 25200
> # Node ID 11340f1b969556646ee853fe3aa5c33e01c535ba
> # Parent  d42cc3c880b69d0ba769082dc28fd642568df7e1
> store: _hashencode2 escapes names more simply
> 
> This is pretty dang simple:
> * hash the prefix
> * basic encode the prefix
> * trim each path component to 8 bytes, fix up trailing space or dot
> * limit length
> * tack on hash and suffix
> 
> diff --git a/mercurial/store.py b/mercurial/store.py
> --- a/mercurial/store.py
> +++ b/mercurial/store.py
> @@ -216,6 +216,23 @@ def _hashencode(path, dotencode):
>          res = 'dh/' + dirs + filler + digest + ext
>      return res
>  
> +def _hashencode2(path):
> +    prefix, suffix = path[:-2], path[-2:]
> +    digest = _sha(prefix).hexdigest()
> +    basic = _auxencode(_encodefname(encodedir(prefix)).split('/'), True)
> +    def trim(f):
> +        if len(f) >= 8:
> +            if f[7] in ' .':
> +                return f[:7] + '~'
> +            return f[:8]
> +        return f
> +    parts = map(trim, basic)
> +    totallen = sum(map(len, parts)) + len(parts) + 41
> +    while parts and totallen > _maxstorepathlen:
> +        p = parts.pop()
> +        totallen -= len(p)
> +    return '/'.join(parts) + digest + suffix
> +
>  def _hybridencode(path, dotencode):
>      '''encodes path with a length limit

With my parsers.cutdirs this would roughly be:

def _hashencode2(path):
    prefix, suffix = path[:-2], path[-2:]
    digest = _sha(prefix).hexdigest()
    return 'dh/' + parsers.cutdirs(path)[5:] + digest + suffix

I'd still split off the "data/" prefix and tack "dh/" in front, also in
order to avoid having collisions of directories of hashed paths with
files of non-hashed paths (directory xx/foo.i/ colliding with a file
xx/foo.i).

As per the _hashencode2 of Bryan: The encodedir won't help, as the .hg
ending added by encodedir may have been cut off due to the directory
truncation (as discussed before).


More information about the Mercurial-devel mailing list