Shallow Clone Status

ShallowClonePlan

The work done on shallow clones as part of GSoC is available as a mq. It implements shallow clones at a given shallow root, with the complete changelog but only the manifest and file nodes that correspond to the shallow root and its descendants. It essentially clones the subgraph of the shallow root and its descendants. When a revlog is shallow, it contains normal index entries for nodes that are needed and punched entries for their parents.

Overview of changes:

* header & index flag - The revlog can be identified as shallow by the header flag REVLOGSHALLOW and a punched entry in the index can be identified by the index flag REVIDX_PUNCHED_FLAG. Punched shallow entries have null parents and no data.

* revlog - generate groups starting with a full revision instead of diff against parent, when the parent is not available at the reciever. For nodes with missing parents, add punched entries for the parents. The index created is truncated and contains only nodes (except punched missing parents) that are needed for the shallow clone. In the case where a previously punched node is needed, we simply append the full revision for the node. The index will contain two, a punched and an unpunched, entry for the node, but the nodemap will always refer to the latest/full index entry. In the case where the node data is needed it is referred to directly by nodeid, which points to the correct index entry.

* changegroupsubset - add shallowroot parameter. While calculating the nodes, find the subgraph with root at shallowroot and collect nodes common to subgraph and missing node list. We only check wether the first parent in each group exists at the reciever, to decide insertion of full revision or a delta. For missing parents, the reciever will create the needed punched entries.

* changelog - create a set of shallow revs(descendants of shallowroot) and update on addition of new revisions. When reading from changelog, if a given node is not part of the shallow clone set its manifest to nullid.

* discovery - determine shallowroot for push & pull operations. Check wether the clones are related and find the shallower of the two clones, since it will have less data than the other. The changegroup is calculated wrt the shallower root.

* add shallowroot parameters to the various functions to enable local shallow cloning, push and pull. The '-s shallowrev' option to clone creates a shallow clone with shallowrev as root.

TBD

* add commands to retrieve shallowroot of remote repo & shallowroot argument for new changegroupsubset in wireproto.

* creating shallow clones from bundled shallow repo. Bundles created by shallowclones will include the shallowroot as part of the bundle header (new bundle).

* groups should start with a delta against the parent available at the reciever, instead of a full revision if the first parent is missing (new bundle).

* deepening of shallow clone, i.e. update repo with history starting at a node which is an ancestor of the current shallowroot. It might be possible using changegroupsubset by choosing the commonrevs and missinglist, ideally returning only the data for the new shallowroot that we dont have, ie descendants(new root) - descendants(old root), since we have descendants(old root).


CategoryGsoc

ShallowCloneStatus (last edited 2010-10-22 18:47:40 by mpm)