[RFC] kbfiles: an extension to track binary files with less wasted bandwidth

Andrew Pritchard awpritchard at gmail.com
Sat Aug 6 13:17:05 CDT 2011


As for documentation, we (or at least I) have been putting it off until
largefiles is closer to release - at the moment there are still a few
outstanding bugs and plenty of internal testing to do.  Nonetheless, it is
pretty simple to use a non-Kiln store: simply serve via hgweb or ssh with the
largefiles extension enabled, and everything should work appropriately.  There
are still some concerns about the more distributed way it can work now, because
it will always look for largefiles on the default path, and it might be
appropriate to add a config option for a default store separate from the
default-push paths.

At the moment, largefiles' branching is somewhat confusing, since we have one
repository containing what should be incorporated into Mercurial and a separate
repository for what we will ship with the Kiln Extensions in order to aid
migrating repositories to the newer layout.  As such, fixes towards largefiles
in general are going into the 'largefiles' repo, and work on migration code
and Kiln-specific things are going into the 'largefiles-kiln' repo.
Unfortunately, this looks likely to break down as soon as we start stripping
compatibility for old versions of Mercurial from the 'largefiles' repo, as we
don't want to merge anti-backwards-compat changes into the Kiln version, but we
will still want to pull bugfixes and feature additions.  As the two diverge, we
will probably add another repository for changes we want in both, and we can
add a branch repository there for Unity's contributions.

As for testing with Kiln, we have split out the Kiln communication code into a
'kilnstore' extension, whose repository is in the same place as the largefiles
ones.  It looks for largefiles and monkey-patches in the code for talking to
Kiln's kbfiles routes.  With both largefiles and kilnstore enabled, there
_shouldn't_ be any problems, but not very many people have been using the
latest version (since the rename) - possibly only me, in fact - so it's fairly
likely to have problems.  Three things have changed in the repository storage
along with the name: the 'kbfiles' requirement is now 'largefiles', the
'.hg/kilnbfiles' directory is now '.hg/largefiles', and the '.kbf' directory is
now '.hglf'.  The last is mostly because the '.hg*' prefix is traditionally
considered reserved for Mercurial's use and is substantially less likely to
collide with anyone's normal files.  The first two are handled transparently by
largefiles-kiln's migration code, which just renames the directory and changes
the requirement if the old one is present.  The other is used transparently
by largefiles-kiln, in that repositories with '.kbf' standins still work, but
they cannot be transparently migrated because the changeset nodeids would
change.  As of right now, the only ways to migrate are using lfconvert to
convert via a normal repository or using the convert extension with a filemap
to rename the .kbf directories.  An actual migration command is coming.

Largefiles does currently open large numbers of connections to download needed
files, which I have recently discovered to be particularly annoying when IIS
decides it needs a full second to decide which HTTP handler to call for a
request.  This could be alleviated with not too much difficulty by two changes:
first, make 'statlfile' batchable; and second, turn 'getlfile' into
'getlfiles', sending multiple files in one connection, either in an ad-hoc
line-based protocol like the Mercurial's ssh transport, or in a tar archive.
The same could be done for 'putlfile'.


More information about the Mercurial-devel mailing list