[RFC] kbfiles: an extension to track binary files with less wasted bandwidth

Greg Ward greg-hg at gerg.ca
Sun Aug 7 23:37:55 UTC 2011

On Thu, Aug 4, 2011 at 11:38 AM, Na'Tosha Bard <natosha at unity3d.com> wrote:
> On Thu, Aug 4, 2011 at 4:13 PM, Greg Ward <greg-hg at gerg.ca> wrote:
>> Finally, I have two *technical* objections: the use of dirstate and
>> the use of standin files. I know, it's pretty rich for *me* to
>> criticise kbfiles/largefiles for using my design. But I'm in a pretty
>> good position to know where I got things wrong.
>> First, the use of a dedicated dirstate for big files was dumb and lazy
>> on my part. Big files have a different life-cycle from regular files,
> How so?  What do you think the typical use case is?  I'd like to compare it
> to what I see here at Unity.

We might be using the term "life-cycle" differently here. I'm not
talking about user-level, I'm talking about the constraints imposed by
the design of bfiles. (Note that I am talking about bfiles here and
*assuming* the same applies to largefiles. I haven't yet done more
than glance at the code, so I don't know how the design of largefiles
has diverged from bfiles.)

Anyways, several months ago I sat down and took a stab at drawing the
state machine for big files. See attached image. I count 13 states for
big files:

1) unknown (just created, not yet bfadd'ed)
2) added (bfadd'ed, not committed)
3) dirty added (bfadd then modify: need bfrefresh to return to state 'added')
4) committed pending (committed, not bfput)
5) missing pending (committed then deleted without bfput)
6) removed pending (committed then bfremove'd without bfput)
7) dirty pending (committed then modified without bfput)
8) modified pending (committed, modified, bfrefresh'ed without bfput)
9) clean (committed and bfput)
10) missing (committed, bfput, deleted)
11) removed (committed, bfput, bfremove'd)
12) dirty (committed, bfput, modified)
13) modified (committed, bfput, modified, bfrefresh'ed)

The use of standins and the need to bfput new big file revs makes the
state machine considerably more complex than what dirstate tries to
track. bfiles works around this by storing more bits of state in
various other places. So far this mostly works, but it's a rich source
of bugs and an unnecessary dependency on Mercurial's internal API. I
IMHO bfiles should have its own custom state-tracking mechanism that
tracks the actual life-cycle of big files, not the approximation that
dirstate can be kludged into tracking. It's on my list, but so are a
lot of things.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bfiles-fsm-scaled.png
Type: image/png
Size: 93680 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20110807/dc47b868/attachment.png>

More information about the Mercurial-devel mailing list