largefiles: still confused about store vs cache on the client

Sun Oct 23 12:39:57 CDT 2011

Hi all --

a few days ago, Eli Carter perfectly described the confusion about
stores and caches with largefiles:

  http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/44912

The ensuing thread got us somewhere, and I think the patches sent by
Benjamin as a result helped. But I'm still confused about a rather
fundamental point: on the client, why do we need *both* a user cache
(currently ~/.cache/largefiles) *and* a local store (.hg/largefiles)?

The server-side is fairly clear: we must have a complete and canonical
store containing every revision of every large file in history. That
is what .hg/largefiles is for *on the server* (right?). And there is
no need for a cache on the server, because no one has a working dir on
the server. (And if they did, I suppose you could just take large file
revs straight from the store.)

(Yes, all this talk of clients and servers is unorthodox in DVCS
circles, but you know what I mean. There's a big difference between
the repo on your local disk where you work, and the repo out there on
the network that you push to. largefiles just takes this existing
informal distinction and makes it a little more formal; it injects a
little old-fashioned client/server talk from the bygone days of CVCS
into a modern DVCS. Yes it's impure, but "practicality beats purity".)

But back on the client, where I pull and push and update and commit,
what purpose does .hg/largefiles server? Having a local cache is
obviously a good thing, although it's not essential. (I never got
around to implementing caching with bfiles, and we've lived without
it. It wastes bandwidth and increases network uptime requirements, but
our LAN at work is fast and reliable. And our biggest bfile is ~30 MB:
peanuts by game developer standards.)

More importantly, the very meaning of .hg/largefiles appears to be
inconsistent from reading hgext/largefiles/design.txt: on the server,
it contains every revision of every largefile ("complete and
canonical"). But on the client, it's just a subset of that. So ...
it's ... like ... a cache. Except it's not called a cache; that's what
~/.cache/largefiles is. Huh?

The only reason I can see for having something other than a cache is
for outgoing revs: if I do

  hg add --large largefile
  hg commit
  # modify largefile
  hg commit
  # modify largefile
  hg commit
  hg push

Then push has to send 3 revs to the server. Only one of them is in the
working dir, and even that's not guaranteed: the user might have
modified it post-commit. So those committed revs have to come from
somewhere. Keeping them in the cache is risky, because users are
allowed to nuke their ~/.cache directory whenever they like. (And
maybe someday we'll add LRU semantics with maximum disk space to the
largefiles cache.)

If *that* is the purpose of .hg/largefiles on the client, then I
understand. But I think it's dangerous using .hg/largefiles as a
complete canonical store on the server, and as
subset-of-the-store-that-is-kinda-like-a-cache-but-not-really-and-oh-by-the-way-also-holds-outgoing-revs
on the client.

Why not .hg/lfoutgoing?

Greg