RFC: Shallow Cloning, Take 2

Fri Jul 3 11:01:37 CDT 2009

On Sat, May 16, 2009 at 8:58 AM, Peter
Arrenbrecht<peter.arrenbrecht at gmail.com> wrote:
> A key problem for both is how they cope with formerly absent revs
> suddenly being needed. This typically happens when a merge brings in
> filerevs from a formerly unrelated branch [1].
>
> With punching, the diff data for absent revs is not stored in the
> revlog, but all linkage is (nodeid, parentrevs, linkrev). So when a
> formerly absent rev is needed, I see two choices:
>
>  * Rewrite the revlog to add the formerly missing data. This breaks
> append-onlyness during pull.
>
>  * Append a new entry for the revision, this time including the data,
> and with a flag indicating it is a dupe. This could work, but it might
> seriously break the rev -> index offset assumptions in the lazy index.

Here's another idea for punching: We could keep formerly absent rev
data in two separate files, thefile.ai/ad (a for additional) alongside
thefile.i/d. In these additional files, we simply use the same format
as in .i/.d, and the same machinery for maintaining them, but we are
are not constrained by topological ordering here.

So when we get a formerly absent rev's data, we just store it's data in .ai/.ad.

And when a rev is found missing in .i/.d, we simply look it up in
.ai/.ad. Here we always read the entire index, as I don't assume it
will ever get large (after all, this is a shallow clone, so I don't
expect to have tons of history here in the first place).

These additional files would only be created for files where we need
formerly absent revs. This only happens when you pull merges with
branches that are older than your shallow root. So in the case of Hg,
I guess a shallow clone rooted at Hg 1.2 would never see them.

-parren