RFC: Improving space efficiency of revlog by splitting data files (any pointers to past discussions?)

Brendan Cully brendan at kublai.com
Thu Feb 28 14:04:16 CST 2008


On Friday, 22 February 2008 at 17:45, Peter Arrenbrecht wrote:
> Hi all
> 
> I recently had an idea on how we could maybe improve revlog's space
> efficiency with local clones and renames. You split the data files
> once they get too big. Like store/myfile.d/{0,1,2,3,...}. The index
> would know which fragment to address. This would mean that larger
> parts of history can remain hardlinked when revlogs change. For
> renames you could symlink to the original name's revlogs and maybe
> force a split. Might also be good for shallow clones (not all of
> history).
> 
> I haven't thought about this in depth yet, but since I'm skiing next
> week I might just have some time to think about this in peace. So: Has
> this been discussed before? Any pointers I should take with me?

I think this is an interesting idea. It might be a bit simpler to
implement than overlay repositories [1]. On the other hand, I think
it's a bit less flexible and it has the drawback that the source
repository needs to be modified to increase the efficiency of the
target.

I also haven't worked on the overlay repository patch queue in some
time. I apologize especially to the two people who spent the time to
refresh the patches to more recent mercurial code. I just haven't
found the time to finish the job.

One thing that the overlay code doesn't do which I think would be nice
is to share revision data even after indexes have diverged. For
instance, it would be great if pulling and merging from the parent
into the overlay shared the original revlog data, even though the
index would obviously be different (different revision number,
different parents).

[1] http://www.selenic.com/mercurial/wiki/index.cgi/OverlayRepository

> ps. My notes so far (not fully thought through yet, but may give an
> idea of where I'm headed):
> 
> Key ideas:
> 
> 	* Split revlog data files into fragments at full copy boundaries.
> 	* Splitting at full copy boundaries retains single read for
> reconstructing revision.
> 	* Create new fragment as last fragment grows beyond certain size.
> 	* Keeps storage hardlinks of local clones more effective over time.
> 	* Redirect fragments of renamed files to original files. Allows cheap
> renames/copies of large files.
> 	* Introduces at most one more file open and read per reconstruction
> of a revision.
> 	* Use indirection flag in index, redirection target is in separate
> file, or own fragment file.
> 	* Target per fragment allows for redirection of partial history
> across multiple renames.
> 	* Only redirect if redirection will save sufficient space.
> 
> Layout:
> 
> 	.hg/store/
> 		my/folder/
> 			myfile.i
> 			myfile.d
> 			myfile.ds/
> 				1
> 				2
> 				3
> 				...
> 
> where myfile.d is fragment 0. In the index, we change the offset into
> 4 bytes offset, 2 bytes fragment number. Meaning we always split if
> offset would exceed its new range. Or else add separate fragment
> number.


More information about the Mercurial-devel mailing list