Differences between revisions 2 and 17 (spanning 15 versions)
Revision 2 as of 2010-04-10 17:26:14
Size: 6158
Comment:
Revision 17 as of 2010-08-03 17:10:13
Size: 3804
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
IRC : vsh irc : vsh
Line 10: Line 10:
GSoC application: gsoc proposal:http://bitbucket.org/vsh/shallow-proposal/src
Line 12: Line 12:
== Shallow Cloning in Mercurial [GSoC proposal] ==
:Author: Vishakh Harikumar <vsh426@gmail.com>
:Description: Google Summer of Code proposal to work on Mercurial Shallow Clone feature
mq: http://bitbucket.org/vsh/hg-shallow-clone/
Line 16: Line 14:
=== Abstract === = Journal =
Line 18: Line 16:
The Shallow Cloning proposal is regarding adding support for shallow cloning in
Mercurial. This feature will allow cloning most recent parts of [large] repositories
in situations constrained by limits on resources such as storage space and
network bandwidth and reliability, preventing creation of a full clone.
100616 write script to get size stats of revlog in a repo, look into discovery.py
Line 23: Line 18:
=== Introduction === 100617 investigate consequences of pruning revisions.
Line 25: Line 20:
Mercurial is widely used by people and organizations as their tool for version
control. Many large repositories are managed by it. The drawback is that
anybody who wants to work with the repository has to clone the repository in its
entirety. The use cases for similar situations boil down to cloning limited
subset of the complete repository from a particular revision aka the shallow clone.
100618 look into performance issue.
Line 31: Line 22:
The shallow clone should work seamlessly with other other clones, which may be
full or shallow, when performing push or pull operations. When earlier history is
required it should be possible to deepen the clone by retrieving earlier revisions.
Guidelines for the implementation are in the Shallow Clone Plan[1] and will
also include discussions with the rest of the community to flesh out details.
100619-100621 fix punching in pull and changegroupsubset in localrepo
Line 37: Line 24:
=== Goals === 100621 fix revlog to create punched groups
Line 39: Line 26:
The goals I see for the project are:
    * Implementing Trimming History
    * Creation of local Shallow Clones
    * Push, Pull for local Shallow Clones
    * Tests to define Shallow Clones
    * Support deepening of Shallow Clones
    * Update bundle format and wire protocol
    * Additional tests for network clones
100622 working on making revlog accept punched revisions
Line 48: Line 28:
===== Trimming History ===== ----
Line 50: Line 30:
Trimming of history will allow removing unwanted history from the repository from
individual revisions and ranges, to entire branches. I plan to implement this using
the punch approach as described in the wiki[2]. This involves removing deltas from the
datafile and updating its length in the indexfile to -1. Problems to solve in the
approach are situations where deltas might not patch correctly and making sure hg
itself is aware of the trimmed history. Trimming will allow the size of the
repository to be reduced and keep only parts of the history that are needed.
100623 add flags to record punched revisions
Line 58: Line 32:
===== Creation of local Shallow Clones ===== 100624 fix up revlog.addgroup to write punched revisions
Line 60: Line 34:
Local Shallow Cloning will work by keeping the complete changelog while truncating
and using the trimming command to remove all history from manifests and file logs
before that of the shallow root. This phase will also involve making decisions
about mercurial's view of shallow clones, such as the storage of the full version
and the deltas of the text, and modification to revlog and bundle format to support
shallow clones. Tests at this stage will be defining the structure of the clone
and used for regression testing as more goals are added.
100625 -
Line 68: Line 36:
===== Push, Pull and Bundle local Repos ===== 100626 get simple shallow clone running and investigate performance
Line 70: Line 38:
[TODO] 100627 look at some bugs in bts
Line 72: Line 40:
===== Tests to define Shallow Clones ===== 100628 fix naive performance issues in current code, patch for mq issue
Line 74: Line 42:
At this point shallow cloning of local repository will be complete. I will write
additional tests to exercise all possible cases. A comprehensive test suite will
define all the functions of shallow clones and can further be used to test shallow
clones that have been created over the network.
----
Line 79: Line 44:
===== Support Shallow Cloning over network ===== 100630 -
Line 81: Line 46:
Cloning over networks is done with the wire protocol. It does not currently support
shallow cloning, since it cannot work with individual changesets ,only a stream of
changegroups encoded in the bundle format. First I will update the bundle format
to inlclude enough information to create shallow clone at given revision. This will
be useful in the wire protocol. There already exists a plan for updating the wire
protocol. I will be coordinating with others working on the same, and add support
for shallow clones. This will enable shallow cloning over networks as well.
100701 redo all patches by splitting them.
Line 89: Line 48:
===== Additional tests for Network Shallow Clones ===== 100702 stumped by error after redo, till i found i lost some code during splitting. used an mq on the mq to fix it (hopefully not be doing that anytime soon)
Line 91: Line 50:
Write tests for wire protocol, bundle format and network clones. This should
complete the test suite for Shallow clones. I will also be updating the wiki
and help to cover all aspects of shallow clones.
100703 changegroupsubset is too slow for whole repo, while doing the collecting in _changegroup in rather redundant. move code to _changegroup for now to improve performance, while considering other options to improve situation including adding fastpath to changegroupsubset.
Line 95: Line 52:
=== Timeline === 100704 looked into: wire protocol, bugs from bts, review patches from ml.
Line 97: Line 54:
I am working through the details of shallow clones and will probably start
coding it before the official start date of the program. I have my final exams
in the first 2 weeks of May. The rest of the time I should be able to concentrate
on Shallow cloning.
100705 spent the day [with attempts] fixing some bugs. sent a patch for issue1881 and some unsatisfying fixes for others.
Line 102: Line 56:
    * Implementing Trimming History [ 2 weeks ]
    * Creation of local Shallow Clones [ 1 week ]
    * Push, Pull for local Shallow Clones [ 1.5 weeks ]
    * Tests for local Shallow Clones [ .5 weeks ]
    * Support deepening of Shallow Clones [ 1.5 weeks ]
    * Update bundle format and wire protocol[ 1 week ]
    * Shallow cloning over network [ 2 weeks ]
    * Additional tests/ing for network clones[ 1 week ]
----
Line 111: Line 58:
=== About === 100708 looking through parren's shallow-clone approach
Line 113: Line 60:
I am a final year BTech student at MPSTME, India. I have written programs in C,
Basic and short stints with Java and Visual Basic(they made me do it :). Currently
most of my programming is in Python. I discovered Mercurial over a year ago and
have been using it for all my projects since. I have read through earliest commits in
mercurial repo when I found mercurial and in the process gained a better understanding
of its internals. I have since read through many modules in tip, for a better
understanding of shallow cloning as well. I intend to make contributions to Mercurial
in the future,via GSoC or otherwise.
100709 create shallow flag for revlogs and shallow file in localrepo to identify shallow revlogs and repos. have changelog maintain list of shallownodes so we know when not to try reading manifest etc.
Line 122: Line 62:
This document and all related work are available at http://bitbucket.org/vsh/hg-shallow/ 100710 modifications to changegroupsubset to work better with shallowclones. find the right nodes needed in changegroup, considering full revisions and deltas.
Line 124: Line 64:
Contact Information 100711 make revlog generate smaller groups, without punched deltas, and only the required nodes. when adding nodes to revlog create punched parent nodes that do not exist.
Line 126: Line 66:
    * Email: vsh426@gmail.com
    * IRC: vsh
100712 looking into bts 1881, 947 and considering 2091 & 2284(which look similar)
Line 129: Line 68:
=== References === 100713 debugging truncation.
Line 131: Line 70:
[1] http://mercurial.selenic.com/wiki/ShallowClonePlan
[2] http://mercurial.selenic.com/wiki/TrimmingHistory
----
Line 134: Line 72:
100714:100716 finish up working local shallow clone with pull and corresponding test, patchbomb ml.

100717 - (create simple [[debugshellExtension]])

100718 write additional tests, work on push from full to shallow clone, discovery(prepush) and investigate problem with updating to the correct revision on initial clone.

100719 finally resolve issue1881, start looking through new wireproto.

----

100721:100723 winding up with college work, not much progress, update patches with reviews and changes upstream.

100724 inspect refactoring changegroupsubset* for shallowclones and extranodes usage.

100726 worked through most of wireproto changes and ready to start work on it pending talking with others.

----

100728 fix performance issue of reading too many manifests. now only read manifests when merge includes node not descendant of shallowroot and for other cases use normal changegroupsubset mechanism.

100729 figure out when the diff parent for revlog.group is available and when to send complete revisions, should be easy to find delta parent using the same.

100730 update patches from reviews.

100731 -

100701 explore some ways to handle pulling previously punched revision, and start participating in protocol discussion.

100702 cleanup patches to push, and make tests for covering cases of pulling in previously punched revisions.

Vishakh Harikumar

Email: <vsh426 AT SPAMFREE gmail DOT com>

irc : vsh

gsoc proposal:http://bitbucket.org/vsh/shallow-proposal/src

mq: http://bitbucket.org/vsh/hg-shallow-clone/

Journal

100616 write script to get size stats of revlog in a repo, look into discovery.py

100617 investigate consequences of pruning revisions.

100618 look into performance issue.

100619-100621 fix punching in pull and changegroupsubset in localrepo

100621 fix revlog to create punched groups

100622 working on making revlog accept punched revisions


100623 add flags to record punched revisions

100624 fix up revlog.addgroup to write punched revisions

100625 -

100626 get simple shallow clone running and investigate performance

100627 look at some bugs in bts

100628 fix naive performance issues in current code, patch for mq issue


100630 -

100701 redo all patches by splitting them.

100702 stumped by error after redo, till i found i lost some code during splitting. used an mq on the mq to fix it (hopefully not be doing that anytime soon)

100703 changegroupsubset is too slow for whole repo, while doing the collecting in _changegroup in rather redundant. move code to _changegroup for now to improve performance, while considering other options to improve situation including adding fastpath to changegroupsubset.

100704 looked into: wire protocol, bugs from bts, review patches from ml.

100705 spent the day [with attempts] fixing some bugs. sent a patch for issue1881 and some unsatisfying fixes for others.


100708 looking through parren's shallow-clone approach

100709 create shallow flag for revlogs and shallow file in localrepo to identify shallow revlogs and repos. have changelog maintain list of shallownodes so we know when not to try reading manifest etc.

100710 modifications to changegroupsubset to work better with shallowclones. find the right nodes needed in changegroup, considering full revisions and deltas.

100711 make revlog generate smaller groups, without punched deltas, and only the required nodes. when adding nodes to revlog create punched parent nodes that do not exist.

100712 looking into bts 1881, 947 and considering 2091 & 2284(which look similar)

100713 debugging truncation.


100714:100716 finish up working local shallow clone with pull and corresponding test, patchbomb ml.

100717 - (create simple debugshellExtension)

100718 write additional tests, work on push from full to shallow clone, discovery(prepush) and investigate problem with updating to the correct revision on initial clone.

100719 finally resolve issue1881, start looking through new wireproto.


100721:100723 winding up with college work, not much progress, update patches with reviews and changes upstream.

100724 inspect refactoring changegroupsubset* for shallowclones and extranodes usage.

100726 worked through most of wireproto changes and ready to start work on it pending talking with others.


100728 fix performance issue of reading too many manifests. now only read manifests when merge includes node not descendant of shallowroot and for other cases use normal changegroupsubset mechanism.

100729 figure out when the diff parent for revlog.group is available and when to send complete revisions, should be easy to find delta parent using the same.

100730 update patches from reviews.

100731 -

100701 explore some ways to handle pulling previously punched revision, and start participating in protocol discussion.

100702 cleanup patches to push, and make tests for covering cases of pulling in previously punched revisions.


CategoryHomepage

VishakhHarikumar (last edited 2010-08-03 17:10:13 by VishakhHarikumar)