Windows people: please help check idea for a new Mercurial repository layout

Adrian Buehlmann adrian at cadifra.com
Mon Jun 16 02:46:36 CDT 2008


On 16.06.2008 02:59, Matt Mackall wrote:
> On Sun, 2008-06-15 at 10:43 -0700, James Walker wrote:
>> Matt Mackall wrote:
>>
>>> It's mostly a problem with path length, actually. Filenames can be 255
>>> characters, but the total path length is limited to 260. Or something
>>> like that. And if someone makes a repository with deeper and deeper
>>> paths over time, most of the directory hierarchy will exist when we hit
>>> the limit.
>> Isn't this only quantitatively different from other OSes, not 
>> qualitatively different?  On the Mac, file names can be 255 Unicode 
>> characters, but MAX_PATH is 1024 (which I think means UTF-8 characters).
> 
> Yes, and it's actually MAX_PATH that's the problem. People are creating
> very deep **pathnames** that exceed Windows' pitiful limit of 260
> characters.  Mercurial currently does everything with absolute paths,
> adds its own .hg/store/data/ in, and escapes all the interesting
> characters, making things worse here. So people may end up having an
> effective MAX_PATH of something like 120, which is a pretty long name,
> but not completely ridiculous.
> 
> If you compare that to a Mac, people can easily create repo pathnames >
> 512 bytes (past "stupidly long" and into "absurdly long"), and I'll have
> absolutely no sympathy for anyone who runs into the 1k limit there.
> 
> We know that NTFS[1] can actually handle paths that are 32K with \\?\
> and probably will allow you to reach files with absolute paths > 260 by
> chdir() + open() without \\?\.

Extremely unlikely (to the part after the last "and"). Not because of NTFS but
because of the higher software layers (Python library).

And you will have to use the ...W functions of win32file for *all* disk
access inside .hg, feeding every path as an absolute path with '\' path
separators only (must include drive letter) in a Unicode string object
prepended by "\\?\".

This is doable, but certainly does not qualify as a "quick hack".

So, PyWin32 will be an obligatory dependency (not really an issue,
just to mention it).

> 255 should be a comfortable limit for individual **filenames**. The
> worst case is something like "日本国" which goes from 6 UTF-16 bytes or
> 9 UTF-8 bytes to "~e6~97~a5~e6~9c~ac~e5~9b~bd.i" (29 bytes), an
> expansion factor that limits such filenames to ~28 characters. As that's
> enough for a haiku or two[2], I don't think that's a serious problem.

I have to admit that it is a shame that the full power of NTFS
is crippled behind such a lousy explorer.exe on Windows.

However, I would appreciate if we could do that reserved name encoding despite
long path [1] being able to write reserved names in theory.

This would at least enable to solve that silly viral reserved name trap problem
mixed platform projects today are facing with Mercurial.

(Yes, I'm talking about the Windows user fraction which still can sail under the
path length limits of explorer.exe).

Provided we can encode the reserved names, I would agree to go the
long path route. But then we will need a hg command for deleting
repositories (neither explorer nor cmd.exe will be able to do that).

> [1] VFAT can probably handle extremely deep paths as well - the limit is
> generally due to operating system buffer sizes and not anything inherent
> in the on-disk structures.
> [2] Basho's best known "old pond" haiku, at 10 characters, encodes to 92
> bytes.
> 

(nothing to respond here, just didn't want to delete anything)

[1] terminology per http://blogs.msdn.com/bclteam/archive/2007/02/13/long-paths-in-net-part-1-of-3-kim-hamilton.aspx


More information about the Mercurial mailing list