bfiles: Rethinking the .hgbfiles directory (WAS: Re: bfiles filename encoding)

Greg Ward greg at gerg.ca
Fri Jun 11 09:19:12 CDT 2010


On Wed, Jun 9, 2010 at 6:25 PM, Benjamin Pollack <benjamin at bitquabit.com> wrote:
> On Jun 7, 2010, at 11:50 AM, Benoit Boissinot wrote:
>
>> I'm not sure you need the same perf characteristics as hg, so why
>> can't you just hash the filename (or is that even necessary, can't you
>> have a git like approach with just blobs indexed by their sha? does
>> the server need to know the filenames?)
>
> That's basically what I've decided is the right solution.
>
> I've been doing a lot of trying to actually use bfiles on various systems, and
> I've become convinced that the existing .hgbfiles system cannot meaningfully
> be made cross-platform. The best solution I can come up with is hybridencode
> or a variant. Unfortunately, hybridencode degrades into a non-reversible hash
> format, which breaks bfiles' expectation that the .hgbfiles path is
> reversible. After some spelunking through "real" repositories, I reluctantly
> have to say that this comes up way more often than you might think--i.e.,
> bfiles can expect to hit the non-reversible version of hybridencode often. The
> only suitable solution to this I can come up with is using a manifest of some
> type to restore reversibility.

Hangonasec.  We started by talking about filename encodings in the
bfiles central store, which definitely needs to change.  Now we're
talking about the standin tree, .hgbfiles/.  Those are just files in
your working dir.  If there is something in .hgbfiles/ that cannot be
represented in the local filesystem, then you're gonna have a hard
time representing the actual big files!  So I don't see much value in
redesigning that tree.

> Provided that you agree with everything I just said, I don't see a value in
> keeping .hgbfiles as a directory. Instead, it'd make sense to me to bring it
> in line with Mercurial's manifests and simply make it be a list of tuples in
> the form (filename, SHA1 checksum). bfput/bfupdate would be updated
> accordingly to merely look for a file named after the SHA-1 checksum,
> regardless of path. Everything else in bfiles (i.e., the UI, the structure of
> .hg/bfiles, etc.) would basically continue to work as-is.

Yeah, that was my original idea, several months ago.  A colleague
talked me into the .hgbfiles/ design because it buys a number of
features almost-for-free: renames, merges, and permission tracking.

Or at least that was the theory.  RIght now, renaming big files does
not work at all.  And there is a bit of code in bfiles for making
permissions tracking work; I have to propagate the permissions bits
tracked by hg to the big files myself.  But merging works -- as long
as there are no conflicts.

> The only downside I see to this system is that the standins can no longer be
> edited by hand, nor can the diff be seen in the history view. The first is
> trivially solved by adding an "hg debugbfsetstandin" or similar. I'm not
> convinced that losing the latter is a big deal compared to never having bfiles
> work correctly on Windows, but if that really bothers you, it'd be easy enough
> to go with a (rather inscrutable) text format instead.

OK, well, you still haven't explained by bfiles can "never work
correctly on Windows".  It actually works fine read-only; we've been
using it in production since January.  Modifying and adding big files
on Windows is currently broken because I did not understand all the
slash/backslash issues between the working dir and dirstate; that is
slowly getting worked out as I port the tests so they work equally on
Unix and Windows.  I've fixed a number of bugs, but I haven't found
any showstopping "it can't possibly work on Windows" problems.

Greg


More information about the Mercurial-devel mailing list