Vishakh Harikumar

Email: <vsh426 AT SPAMFREE gmail DOT com>

IRC : vsh

GSoC application: The updated version is available at http://bitbucket.org/vsh/hg-shallow/src/

Shallow Cloning in Mercurial [GSoC proposal]

:Author: Vishakh Harikumar <vsh426@gmail.com> :Description: Google Summer of Code proposal to work on Mercurial Shallow Clone feature

Abstract

The Shallow Cloning proposal is regarding adding support for shallow cloning in Mercurial. This feature will allow cloning most recent parts of [large] repositories in situations constrained by limits on resources such as storage space and network bandwidth and reliability, preventing creation of a full clone.

Introduction

Mercurial is widely used by people and organizations as their tool for version control. Many large repositories are managed by it. The drawback is that anybody who wants to work with the repository has to clone the repository in its entirety. The use cases for similar situations boil down to cloning limited subset of the complete repository from a particular revision aka the shallow clone.

The shallow clone should work seamlessly with other other clones, which may be full or shallow, when performing push or pull operations. When earlier history is required it should be possible to deepen the clone by retrieving earlier revisions. Guidelines for the implementation are in the Shallow Clone Plan[1] and will also include discussions with the rest of the community to flesh out details.

Goals

The goals I see for the project are:

Trimming History

Trimming of history will allow removing unwanted history from the repository from individual revisions and ranges, to entire branches. I plan to implement this using the punch approach as described in the wiki[2]. This involves removing deltas from the datafile and updating its length in the indexfile to -1. Problems to solve in the approach are situations where deltas might not patch correctly and making sure hg itself is aware of the trimmed history. Trimming will allow the size of the repository to be reduced and keep only parts of the history that are needed.

Creation of local Shallow Clones

Local Shallow Cloning will work by keeping the complete changelog while truncating and using the trimming command to remove all history from manifests and file logs before that of the shallow root. This phase will also involve making decisions about mercurial's view of shallow clones, such as the storage of the full version and the deltas of the text, and modification to revlog and bundle format to support shallow clones. Tests at this stage will be defining the structure of the clone and used for regression testing as more goals are added.

Push, Pull and Bundle local Repos

[TODO]

Tests to define Shallow Clones

At this point shallow cloning of local repository will be complete. I will write additional tests to exercise all possible cases. A comprehensive test suite will define all the functions of shallow clones and can further be used to test shallow clones that have been created over the network.

Support Shallow Cloning over network

Cloning over networks is done with the wire protocol. It does not currently support shallow cloning, since it cannot work with individual changesets ,only a stream of changegroups encoded in the bundle format. First I will update the bundle format to inlclude enough information to create shallow clone at given revision. This will be useful in the wire protocol. There already exists a plan for updating the wire protocol. I will be coordinating with others working on the same, and add support for shallow clones. This will enable shallow cloning over networks as well.

Additional tests for Network Shallow Clones

Write tests for wire protocol, bundle format and network clones. This should complete the test suite for Shallow clones. I will also be updating the wiki and help to cover all aspects of shallow clones.

Timeline

I am working through the details of shallow clones and will probably start coding it before the official start date of the program. I have my final exams in the first 2 weeks of May. The rest of the time I should be able to concentrate on Shallow cloning.

About

I am a final year BTech student at MPSTME, India. I have written programs in C, Basic and short stints with Java and Visual Basic(they made me do it :). Currently most of my programming is in Python. I discovered Mercurial over a year ago and have been using it for all my projects since. I have read through earliest commits in mercurial repo when I found mercurial and in the process gained a better understanding of its internals. I have since read through many modules in tip, for a better understanding of shallow cloning as well. I intend to make contributions to Mercurial in the future,via GSoC or otherwise.

This document and all related work are available at http://bitbucket.org/vsh/hg-shallow/

Contact Information

References

[1] http://mercurial.selenic.com/wiki/ShallowClonePlan [2] http://mercurial.selenic.com/wiki/TrimmingHistory


CategoryHomepage