Windows long path experimenting report

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Fri Jun 20 07:48:23 CDT 2008


On Fri, Jun 20, 2008 at 2:22 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 20/06/2008, Patrick Mézard <pmezard at gmail.com> wrote:
>> Yes, that's what I am starting to think after Adrian tests. In which case, we
>> should implement a hash-based (or anything similar) encoding strategy, use it
>> under Windows and document that stores are not always compatible across
>> filesystems.
>
> I've asked before, but I'll ask again, as I don't think I've seen a
> really clear answer:
>
> What precisely is the requirement that excludes hashing? Matt's
> description seems to indicate that he wants store filenames to sort
> the same way as working copy filenames, as closely as possible. Is
> that the requirement? If so, something like the first few characters
> plus a hash would probably do. As a straw-man proposal: the first 10
> characters of the filename, plus a md5 hex digest
>
> def encode(name):
>    return name[:10]+md5.md5(name).hexdigest()
>
> That's no more than 42 characters, it sorts almost the same as the
> original, and it avoids reserved names. It's not reversible, but that
> property was optional, from what I recall. For added robustness, strip
> out any non-portable characters like colon, question mark, slash,
> backslash, etc. That has a minor effect on sortability, but that's the
> case at the moment anyway.

If `name` is a path, then this will not sort like the original
structure very well.

If it's just a component name, then we'll bump against the 260 (or so)
max path length limit again soon (260 / 42 ~= 6 = max folder nesting
depth). So at least if the component name is shorter than 42 chars, we
should drop the encoding and use the plain name. And we shall have to
disambiguate hashed and non-hashed names, so hashed names should maybe
always contain a special char and any plain name that contains said
char also gets hashed automatically, or something.

But still, it wouldn't really solve the problem I think.

>
> I'm very concerned to get this right, as Matt stated that it had a
> major effect on performance - but without either a checkable
> requirement, or a (relatively simple) test to see if an alternative is
> OK, I don't see how we can make progress.
>
> Paul
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>



More information about the Mercurial-devel mailing list