largefiles: still confused about store vs cache on the client

Na'Tosha Bard natosha at unity3d.com
Mon Oct 24 05:15:08 CDT 2011


2011/10/23 Greg Ward <greg at gerg.ca>

> Hi all --
>
> a few days ago, Eli Carter perfectly described the confusion about
> stores and caches with largefiles:
>
>  http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/44912
>
> The ensuing thread got us somewhere, and I think the patches sent by
> Benjamin as a result helped. But I'm still confused about a rather
> fundamental point: on the client, why do we need *both* a user cache
> (currently ~/.cache/largefiles) *and* a local store (.hg/largefiles)?
>
> The server-side is fairly clear: we must have a complete and canonical
> store containing every revision of every large file in history. That
> is what .hg/largefiles is for *on the server* (right?). And there is
> no need for a cache on the server, because no one has a working dir on
> the server. (And if they did, I suppose you could just take large file
> revs straight from the store.)
>
> (Yes, all this talk of clients and servers is unorthodox in DVCS
> circles, but you know what I mean. There's a big difference between
> the repo on your local disk where you work, and the repo out there on
> the network that you push to. largefiles just takes this existing
> informal distinction and makes it a little more formal; it injects a
> little old-fashioned client/server talk from the bygone days of CVCS
> into a modern DVCS. Yes it's impure, but "practicality beats purity".)
>
> But back on the client, where I pull and push and update and commit,
> what purpose does .hg/largefiles server? Having a local cache is
> obviously a good thing, although it's not essential. (I never got
> around to implementing caching with bfiles, and we've lived without
> it. It wastes bandwidth and increases network uptime requirements, but
> our LAN at work is fast and reliable. And our biggest bfile is ~30 MB:
> peanuts by game developer standards.)
>
> More importantly, the very meaning of .hg/largefiles appears to be
> inconsistent from reading hgext/largefiles/design.txt: on the server,
> it contains every revision of every largefile ("complete and
> canonical"). But on the client, it's just a subset of that. So ...
> it's ... like ... a cache. Except it's not called a cache; that's what
> ~/.cache/largefiles is. Huh?
>
> The only reason I can see for having something other than a cache is
> for outgoing revs: if I do
>
>  hg add --large largefile
>  hg commit
>  # modify largefile
>  hg commit
>  # modify largefile
>  hg commit
>  hg push
>
> Then push has to send 3 revs to the server. Only one of them is in the
> working dir, and even that's not guaranteed: the user might have
> modified it post-commit. So those committed revs have to come from
> somewhere. Keeping them in the cache is risky, because users are
> allowed to nuke their ~/.cache directory whenever they like. (And
> maybe someday we'll add LRU semantics with maximum disk space to the
> largefiles cache.)
>
> If *that* is the purpose of .hg/largefiles on the client, then I
> understand. But I think it's dangerous using .hg/largefiles as a
> complete canonical store on the server, and as
>
> subset-of-the-store-that-is-kinda-like-a-cache-but-not-really-and-oh-by-the-way-also-holds-outgoing-revs
> on the client.
>
> Why not .hg/lfoutgoing?
>
> Greg
>

I think the fundamental thing you are missing here is that it is quite
possible for a user to have multiple clones that share the same set of
largefiles.  If there is a team that uses branch-by-cloning, this is almost
*certainly* the case.  Our team does, and I'm sure there are still others --
which will continue to be the case until either
feature-branching-by-named-branches is no longer discouraged or bookmarks
are actually supported in the real world (which means by hosting solutions,
continuous integration solutions, etc).

By storing a copy of all of the largefiles in a local cache somewhere, the
user, when they make a new branch clone, or update to a revision that needs
one of the lagefiles that is used by another clone, they can simply copy it
out of the cache, rather than re-download it, thus saving bandwidth (which
is one of the goals of this extension anyway).

Cheers,
Na'Tosha

-- 
*Na'Tosha Bard*
Build & Infrastructure Developer | Unity Technologies

*E-Mail:* natosha at unity3d.com
*Skype:* natosha.bard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111024/fbd93b61/attachment.html>


More information about the Mercurial-devel mailing list