[RFC] kbfiles: an extension to track binary files with less wasted bandwidth

Benjamin Pollack benjamin at bitquabit.com
Mon Aug 8 13:52:14 CDT 2011


On Aug 7, 2011, at 7:37 PM, Greg Ward wrote:

> 1) unknown (just created, not yet bfadd'ed)
> 2) added (bfadd'ed, not committed)
> 3) dirty added (bfadd then modify: need bfrefresh to return to state 'added')
> 4) committed pending (committed, not bfput)
> 5) missing pending (committed then deleted without bfput)
> 6) removed pending (committed then bfremove'd without bfput)
> 7) dirty pending (committed then modified without bfput)
> 8) modified pending (committed, modified, bfrefresh'ed without bfput)
> 9) clean (committed and bfput)
> 10) missing (committed, bfput, deleted)
> 11) removed (committed, bfput, bfremove'd)
> 12) dirty (committed, bfput, modified)
> 13) modified (committed, bfput, modified, bfrefresh'ed)

I agree; that's a very complicated state diagram.  But largefiles doesn't work that way.

All of the bf* commands from bfiles are dead.  largefiles automatically manages the equivalent operations.  We submitted patches to enable this for bfiles a year or two ago, but they are optional for bfiles.  They have always been mandatory for Kiln.

Axing all the bf* commands *dramatically* simplifies the state machine.  In fact, if you make this all automatic, the user-visible lifecycle for largefiles ends up being the same as for any file in Mercurial: it's unknown, added, missing, removed, or modified.  When you commit, we copy it to the repository and global caches to allow reverting and the like.  Whenever you push to a Mercurial repository, any missing largefiles are uploaded before the bundle is sent; the server rejects the bundle if the corresponding largefiles haven't yet been uploaded.  Because kbfiles and largefiles have always worked this way, determining the missing largefiles is as easy as walking the manifests of the changesets you're about to push.

largefiles does currently maintain its own dirstate, but that's just a legacy that I had thus far found convenient to preserve for debugging.  There's no technical reason largefiles couldn't simply alter the state of Mercurial's dirstate file directly.  If there's agreement that's the right direction, I can't think of any technical reason we couldn't ditch it.  Everyone would just have to understand that the physical dirstate "lies" with largefiles enabled.

Incidentally, this transparent operation was the goal of kbfiles, and is the goal of largefiles: working with largefiles should involve nothing more than making sure all relevant Mercurials have largefiles enabled and marking that you want a given file as managed by largefiles.  Really, the only user-visible difference in largefiles should be that "hg update" sometimes requires network access to fetch missing largefiles.  That's it.  With largefiles' support for "hg serve" out-of-the-box, I think we're nearly there.

--Benjamin


More information about the Mercurial-devel mailing list