[RFC] kbfiles: an extension to track binary files with less wasted bandwidth

Matt Mackall mpm at selenic.com
Tue Jul 26 15:23:06 CDT 2011


On Tue, 2011-07-26 at 14:23 -0400, Andrew Pritchard wrote:
> kbfiles has several mechanisms for defending its repositories against damage
> from non-kbfiles clients:
> - add a 'kbfiles' line to .hg/requires in order to keep non-kbfiles clients
>   from breaking things;
> - add a 'bfilestore' server capability, without which the client will not
>   attempt to interact with a remote repository when the local repository uses
>   kbfiles; and
> - prepend 'kbfiles\n' to the output of the heads command when serving kbfiles
>   repositories to prevent non-kbfiles clients from creating broken clones.
> 
> The last of these is fairly likely to be controversial, but it currently seems
> to be necessary.  Although the HG19 bundle format as described on the wiki
> would appear to solve the problem with its feature strings, it also does not
> appear to be implemented yet.  If and when it is, kbfiles will replace the
> heads command hack with a 'kbfiles' bundle feature.  Unfortunately, the result
> is that non-kbfiles clients throw an exception with no mention of kbfiles, but
> we could not find a way to make the client display a useful error message while
> consistently preventing them from uploading changesets without the
> corresponding bfiles or creating clones that are missing files.

Ok, so the issues are:

a) we don't want clients to get incomplete/broken/bogus check-outs
b) we don't want clients to fail to push big files back to servers
c) we (probably?) don't want clients to convert big files back into very
large normal files and then push them again

On the other hand, we probably don't want to break the entire protocol.

So we want to cleanly refuse push and pull to clients who don't identify
themselves as big file users. I think we can probably manage this, and
still work with old clients:

$ hg in http://selenic.com/fail.bin
real URL is http://www.selenic.com/fail.bin
abort: 'http://www.selenic.com/fail.bin' does not appear to be an hg
repository:
---%<--- (application/octet-stream)
oops!

You're trying to pull from a server that requires the you to have the
foo extension enabled.

---%<---
!

Support for this goes back as far as 1.4. Generating a similar ssh
banner should be even easier as we have an independent error stream.

Unfortunately, having the server decide whether or not to serve a client
based on _client_ capabilities is something we've carefully avoided up
to this point: all clients should be capable of reading from all
servers, and the client is supposed to make all the decisions based on
reported server capabilities. So the client never advertises its
capabilities to the server because the server doesn't care.

So the server needs to advertise "bigfiles" and then _move_ the existing
push/pull commands and replace them so that any client that uses the old
commands gets the error messages.

> The extension also currently supports talking to previous versions of Kiln that
> still serve bfiles over a different interface, via POST and GET requests to
> $REPO/bfile/$SHA.  Although we would prefer to keep this in the extension, we
> are able and willing to pull it out into its own meta-extension if necessary.

I guess this is for versions of Kiln that exist outside of your control?

Moving an extension into the main repo is pretty much the last point at
which we get to break backward compatibility and drop legacy support, so
I would ask you to seriously consider taking this opportunity to
jettison anything you don't want to support long-term.

> We are still in the process of cleaning up the code to ship with Mercurial, but
> the current status can be seen at
> http://developers.kilnhg.com/Repo/Kiln/Group/Unstable/Files.  Before the 'real'
> pull request, we will collapse it into a single patch in the hgext directory.
> Planned changes before then include removing compatibility shims for old
> versions of Mercurial and some minor rebranding to remove mentions of 'Kiln'
> from the code and repository layout.
> 
> We would prefer to avoid renaming the extension if possible, both to avoid
> adding extra code to handle both old repositories and new ones and to reflect
> the heritage of the extension, but we understand that parts of the Mercurial
> community may be opposed to the name 'kbfiles', and as such we are willing to
> rename to 'terafiles' if the name would otherwise block the extension from
> shipping with Mercurial.

I don't particularly object to Kiln part of the heritage being visible
and documented (though we also shouldn't lose track of Greg Ward's
contribution here!). I note from the repo that there's a shortage of
copyright headers, we'll want to get some on there.

But I think the name is liable to be a source of confusion:

- unlike the original 'bigfiles', its purpose isn't immediately obvious
- for a while at least, it won't be clear from bug reports which kbfiles
we're talking about and who's responsible for it
- as I've mentioned before, 'kb' actually implies -small- files!

I don't think 'terafiles' is ideal here either. How about simply
'largefiles'? It's not taken already and is clearly distinct from the
existing bigfiles/bfiles/kbfiles.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list