[PATCH] prevent any file paths under .hg/store/data/ from getting too long

Jesse Glick jesse.glick at sun.com
Thu Dec 20 07:05:28 CST 2007


Background comments about the impl:

Before I started I assumed this would be an easy fix - simply chop off 
some prefix from long store paths, maybe uniquify if required with a 
hash. I assumed that the manifest would perhaps list triplets of working 
(checkout) path, store path, and node ID.

As I was surprised to discover, the manifest in fact only contains pairs 
of working path and node ID; the store path is computed on demand using 
a translation function, which must be repeatable (hence the use of a 
hash to uniquify); and this function must be reversible (for 
streamclone.py). Making it reversible complicates the patch since it is 
then necessary to maintain a separate .hg/store/longnames from which the 
working path can be recovered.

A different approach (probably too ambitious for me) would be to 
maintain a working -> store path mapping file (00mapping.[di]?) with 
pairs like

some/path/to/File.txt                some/path/to/_file.txt
some/very/truly/long/path/to/a/file  truly/long/path/to/a/file
another/truly/long/path/to/a/file    2ruly/long/path/to/a/file

Hg would add an entry after creating a new revlog in storage. (Or 
before? Not sure about locking semantics there.)

This seems related to the problem with repo growth after renames (filed 
as #883). Again, I was surprised when I found that renaming a file 
creates a new revlog; clearly this is required if the translation 
function is to be repeatable. If there were a persisted mapping file, 
perhaps Hg could reuse the same revlog for the renamed working path:

original-file  original-file
new-file       original-file

.hg/store/data/original-file.[di] would then keep revisions of both 
original-file and new-file, which would I guess permit good compression 
if the file did not change much or at all during the move. (Probably the 
existing metadata keys 'copy' and 'copyrev' would still be needed for 
rename merges, --follow, etc. to work.)

Am I off track here? Is there a reason why the current design is necessary?



More information about the Mercurial-devel mailing list