Lazy remote clones?
andreas.axelsson at gmail.com
Wed Mar 26 16:35:03 CDT 2008
One of the problems that is regularly mentioned about distributed version
control systems has to do with repository size when it comes to handling a
lot of "binary" files in the repo. Coming from a games development
background where assets (images, models, video, sound) can easily end up
using several tens of gigabytes at the tip and maybe hundreds in the
history, I most certainly understand that argument. And while I really would
like to get hold of some of the features offered by DVCS, this is a factor
holding me back.
So, I was thinking that perhaps it would be possible to replace historical
content with pointers to a "master" repository. Most of the time you don't
need to diff against or view old versions of binaries, but you do need the
history intact, and it's not uncommon that you need to recreate old builds
or patch previous releases. So, I would propose two features:
1. The ability to tell "clone" not to copy along large revisions
(other than those at the tip) of files that aren't stored in a "diff"-style
format (which I assume binaries aren't anyway, since they usually diff very
badly). Instead it stores a pointer to the original repo, allowing a user to
fetch it only when it's needed, which would be rarely. I admit this goes a
bit against the distributed mentality, but it could be an optional feature
for those who can maintain a central repo without size constraints. The
option would still be there to clone completely of course. Oh, and for local
clones I guess this would normally be pointless, since things are hard
linked anyway. So it'd be useful only for remote clones.
2. The ability to "purge" local large binaries from a clone, replacing
them with pointers to the original parent repo. This would be useful if
you've done a lot of history browsing, or if you've been updating the repo
for a long time, getting new tip versions of those binaries, building up a
lot of payload on your local repo.
I haven't looked into the repo data structure enough to know if this would
blow up the general model anywhere, especially when it comes to repo
integrity checking and version hashing, but I'm sure there are those of you
who have and that you'll let me know if I'm completely lost here. Anyway,
this is a common model even for centralized version control tools that
handle large assets, only they allow an admin to back up old revisions to
tape. Anyone asking for that revision would get a message about the file
being offline that they could bring to an administrator. Something similar
could be done for Hg, allowing a user to get the internet connection up in
order to fetch the missing files.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mercurial