RFC: Improving space efficiency of revlog by splitting data files (any pointers to past discussions?)

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Fri Feb 22 10:45:52 CST 2008

Hi all

I recently had an idea on how we could maybe improve revlog's space
efficiency with local clones and renames. You split the data files
once they get too big. Like store/myfile.d/{0,1,2,3,...}. The index
would know which fragment to address. This would mean that larger
parts of history can remain hardlinked when revlogs change. For
renames you could symlink to the original name's revlogs and maybe
force a split. Might also be good for shallow clones (not all of

I haven't thought about this in depth yet, but since I'm skiing next
week I might just have some time to think about this in peace. So: Has
this been discussed before? Any pointers I should take with me?


ps. My notes so far (not fully thought through yet, but may give an
idea of where I'm headed):

Key ideas:

	* Split revlog data files into fragments at full copy boundaries.
	* Splitting at full copy boundaries retains single read for
reconstructing revision.
	* Create new fragment as last fragment grows beyond certain size.
	* Keeps storage hardlinks of local clones more effective over time.
	* Redirect fragments of renamed files to original files. Allows cheap
renames/copies of large files.
	* Introduces at most one more file open and read per reconstruction
of a revision.
	* Use indirection flag in index, redirection target is in separate
file, or own fragment file.
	* Target per fragment allows for redirection of partial history
across multiple renames.
	* Only redirect if redirection will save sufficient space.



where myfile.d is fragment 0. In the index, we change the offset into
4 bytes offset, 2 bytes fragment number. Meaning we always split if
offset would exceed its new range. Or else add separate fragment

More information about the Mercurial-devel mailing list