[RFC] kbfiles: an extension to track binary files with less wasted bandwidth

Na'Tosha Bard natosha at unity3d.com
Mon Aug 8 07:05:57 CDT 2011


On Mon, Aug 8, 2011 at 1:37 AM, Greg Ward <greg-hg at gerg.ca> wrote:

> On Thu, Aug 4, 2011 at 11:38 AM, Na'Tosha Bard <natosha at unity3d.com>
> wrote:
> > On Thu, Aug 4, 2011 at 4:13 PM, Greg Ward <greg-hg at gerg.ca> wrote:
> >> Finally, I have two *technical* objections: the use of dirstate and
> >> the use of standin files. I know, it's pretty rich for *me* to
> >> criticise kbfiles/largefiles for using my design. But I'm in a pretty
> >> good position to know where I got things wrong.
> >>
> >> First, the use of a dedicated dirstate for big files was dumb and lazy
> >> on my part. Big files have a different life-cycle from regular files,
> >
> > How so?  What do you think the typical use case is?  I'd like to compare
> it
> > to what I see here at Unity.
>
> We might be using the term "life-cycle" differently here. I'm not
> talking about user-level, I'm talking about the constraints imposed by
> the design of bfiles.


Aah yes, I interpreted your statement to mean life-cycle in terms of user
operations.


> (Note that I am talking about bfiles here and
> *assuming* the same applies to largefiles. I haven't yet done more
> than glance at the code, so I don't know how the design of largefiles
> has diverged from bfiles.)
>
> Anyways, several months ago I sat down and took a stab at drawing the
> state machine for big files. See attached image. I count 13 states for
> big files:
>
> 1) unknown (just created, not yet bfadd'ed)
> 2) added (bfadd'ed, not committed)
> 3) dirty added (bfadd then modify: need bfrefresh to return to state
> 'added')
> 4) committed pending (committed, not bfput)
> 5) missing pending (committed then deleted without bfput)
> 6) removed pending (committed then bfremove'd without bfput)
> 7) dirty pending (committed then modified without bfput)
> 8) modified pending (committed, modified, bfrefresh'ed without bfput)
> 9) clean (committed and bfput)
> 10) missing (committed, bfput, deleted)
> 11) removed (committed, bfput, bfremove'd)
> 12) dirty (committed, bfput, modified)
> 13) modified (committed, bfput, modified, bfrefresh'ed)
>
> The use of standins and the need to bfput new big file revs makes the
> state machine considerably more complex than what dirstate tries to
> track. bfiles works around this by storing more bits of state in
> various other places. So far this mostly works, but it's a rich source
> of bugs and an unnecessary dependency on Mercurial's internal API. I
> IMHO bfiles should have its own custom state-tracking mechanism that
> tracks the actual life-cycle of big files, not the approximation that
> dirstate can be kludged into tracking. It's on my list, but so are a
> lot of things.
>

I don't think this is "substantially" more complicated, it just seems like
we need to keep track of 1 more variable -- whether the bfile has been
uploaded to the central share or not.  Maybe bfdirstate does this, but I
don't think so from what I've seen when working on the code myself.

A related comment is that I think there's a bit of added complication --
situations where we have internal mercurial code that modifies our stand-ins
but of course leaves the working copy with an incorrect bfile.  Situations
where this arises and we were not correctly updating the bfiles in the
working copy to reflect the stand-ins has been a source of bugs that I've
seen -- specifically it is why rebasing did not work for a long time in
kbfiles; I think a similar situation caused a remove/status bug.  I've
almost wondered if we need some global way of saying, "If we call into core
mercurial code, always trust the stand-ins to be correct and update working
copy to reflect that", but maybe that's not safe either.  (Just
brainstorming here).

Cheers,
Na'Tosha

-- 
*Na'Tosha Bard*
Build & Infrastructure Developer | Unity Technologies

*E-Mail:* natosha at unity3d.com
*Skype:* natosha.bard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20110808/b23968b5/attachment.html>


More information about the Mercurial-devel mailing list