[RFC] largefiles - "system-wide" cache

Wed Oct 19 09:17:21 CDT 2011

> From: gerg.ward at gmail.com [mailto:gerg.ward at gmail.com] On Behalf Of
> Greg Ward
> On Tue, Oct 18, 2011 at 4:38 PM, Carter, Eli <Eli.Carter at tektronix.com>
> wrote:
> > The largefiles extension talks a lot about a system-wide cache.  There are
> two problems with this.  First, this cache is per-user, not a system-wide.  And
> secondly, it is more accurately described as a 'store' than as a 'cache'.
> 
> I have been thinking about sending in a patch to rename it to "user cache",
> but I had not realized that it's really a store. Thanks for clarifying!
> 
> (In fact, I was going to propose moving it to ~/.cache/hg-largefiles, since after
> all "it's a cache, it should go in ~/.cache to clarify that it's safe to dispose of".
> Oops! Good thing I didn't get around to
> that.)

Heh.  Actually, I think it _should_ be a cache.  And making $repo/.hg/largefiles be the store should allow ~/.largefiles (or whatever name) to be just a cache.

> > Regarding 'system-wide' vs 'per-user':
> > The extension assumes files are in the system cache, but if one user is
> cloning a repository from another user on the same machine, those files
> _won't_ be in the "system-wide" cache, and the clone winds up without the
> largefiles (and spews "Can't get file locally" errors).
> 
> Ick. Sounds like we need more test cases.

I have a testcase to demonstrate this.  I'll try to post it later.

> > Therefore, there are two things I'd like to see:
> >
> > 1: I'd like to have the extension populate $repo/.hg/largefiles in addition to
> ~/.largefiles (using hardlinks where possible), and reference it when looking
> for files.
> 
> Seems reasonable. Actually I'd like someone to explain to me why we need
> two complete copies of all the largefiles on the local system.
> ;-) If that is in fact the case... so far I've mainly been reading the code rather
> than actually *using* largefiles.

The purpose is so that when you do another clone of the repo from the server that you don't have to download those 1.5GB files all over again.  And the intent, from what I understand, was for these to be hardlinked so you don't have multiple copies of it, just multiple references.
I don't know how well that works on Windows machines.

> Tweak: it should be ~/.hg-largefiles, not ~/.largefiles.
> 
> > 2: I'd like to have an --all-largefiles option for hg clone and hg pull that
> downloads all versions of all largefiles referenced by any changeset included
> in the transfer.
> 
> Also seems reasonable, but it's unclear to me whether those downloaded
> revs belong in ~/.hg-largefiles or $repo/.hg/largefiles. The relationship
> between those two directories is still unclear to me.

As it is currently implemented:
$repo/.hg/largefiles plays two rolls: for the 'main' repository, it is a 'store' of all the largefiles; for clones of the 'main' repository, it is a 'cache' of all the largefiles that the client downloaded.
~/.largefiles serves the same two rolls, but allows different repositories on the same machine to share the cache and reduce over-all disk usage by using hardlinks.

As (I think) it should be implemented:
~/.largefiles (or ~/.hg-largefiles) should be a cache and only a cache.
$repo/.hg/largefiles should be a store.
I think we really need some way to say 'this repo must maintain a complete store' so there is a way to assert that 'this repo is always complete'.  As it stands, I worry about data getting lost in a convoluted backup-and-restore shuffle.

Eli