Windows long path experimenting report

Adrian Buehlmann adrian at cadifra.com
Sat Jun 21 03:53:32 CDT 2008


On 20.06.2008 16:51, Peter Arrenbrecht wrote:
> In the hope of further stimulating the discussion, I have another
> proposal, even though it is not fully satisfactory to me at this
> point. It follows Adrian's and Patrick's idea that maybe we should
> accept the need for a different format and, thus, manual intervention
> when filesystem-level interoperability is desired[1]. This could also
> be applied to your proposal, meaning it would be optional to turn on
> the encoding you propose.
> 
> 
> Proposal:
> 
> We use Adrian's ubar encoding (basically _ + present encoding for each
> path component) for sufficiently short paths, and otherwise fall back
> on a hashing scheme. Of course, we need a quick way to define
> "sufficiently short" that, for a given repo, is fixed across
> platforms. So we just configure the repo with a fixed max path length
> and store this in, for instance, .hg/maxpathlen. Then:
> 
> def encode(path):
> 	ubar = ubarencode(path)
> 	if len(ubar) > repo.maxstorepathlen:
> 		return hashencode(ubar) # or hashencode(path)
> 	return ubar
> 
> def hashencode(path):
> 	# note there's no _ before hashed, so this cannot collide with ubar
> encoded names
> 	return 'hashed/' + path[:10] + md5.md5(path).hexdigest()
> 
> Windows would, by default, apply a limit of 260/2 (a heuristic - see
> below). Linux would apply none, but in any case we should add an
> option to `hg clone` that specifies it. Might also add variants where
> we specify `hg clone --sharable-by win,unix` which computes an
> appropriate max path length.

A very interesting proposal (assuming it is used with a fixed "limit", as
per mpm).

As I understand it, the term "limit" (130 by parren, 200 by mpm) may be a bit
misleading though, as the parren repo layout does not *limit* the paths that can
be stored, but it is merely a special path length magic number, which, when a
tracked path (repo relative) exceeds it, the encoding is switched to hashing,
so the filelog *can* still be stored for paths that would exceed that "limit"
when stored using the unhashed path. Which is a drastic improvement compared
to the current layout, which simply aborts in that case (on Windows).

As such, I don't understand why Paul (in another post) wants to have that
"watermark" magic number at MAX_PATH - I assume this is 256 (Quote: "Basically,
I'm against enforcing a limit other than MAX_PATH").

There should be some headroom for the repo root plus length of ".hg/store/data".
Also, IIRC, Mercurial creates temporary files inside the store during some
operations (pull?) which have some short random strings appended to the original
path (not completely sure on that, will have to recheck to be sure). So setting
that switch-the-encoding magic number to MAX_PATH seems bad to me.

Or did I misunderstand/confuse something?


More information about the Mercurial-devel mailing list