Google Summer of Code-2010

I am interested in "Parent delta". I can also work on conversion tools. But I am mainly focusing on Parent Delta Plan.

A word about project:

Mercurial calculates diffs against previous revision rather than parent. In some cases this implementation is space inefficient, and it is more sensible to store deltas against parent. This project is about implementing Parent Delta Plan. This implementation changes the structure of revlogs, so wire protocol has to be extended to allow backward compatibility.

============================================================================

Proposal

TO DO(Tentative): Major:

Changes in revlog:

parent delta is implemented better compression can be achieved but, may have to compromise with number of seeks.

Changes in wire protocol:

About Me:

Contact Information:

Schedule:

Timeline

Changes in revlog structure: [Total 3 weeks]

Changes in wire protocol:[Total 3 to 3 1/2 weeks]

Link to my GSoC application on bitbucket.org

============================================================================================

Work Progess:

Couple for things related to revlog

1) A script by tonfa, proof for parent delta.

2) mpm explaining how space inefficiency is caused.

3) A python script to shrink revlog by sorting topological order in mercurial repo.

Earlier in this week I started working on revlog, most of the work is already done by tonfa. He shared patches with me. All I did was understanding what he did, and rewrite those patches. With minor changes I rewrote the patches.

strategy:

As mpm suggested, I have changed the way new revision is being added to existing revlog, and make corresponding changes in creating a given revision of file. The crucial step (performance affecting step) is creating a given version. previously to create a given version of a file first we take base revision of file, to that we keep adding the deltas till given revision. Earlier it deltas are added in linear manner. So creating a given revision of file is simply seeking O(1) times. But now it is not the case.

example scenario:

We want to create a R1 revision of a file. The parents of R1 are P1 and P2. Base of R1 is B1.

Consider a situation where P2 is null, and P1 < B1. In this case I simply can't take B1 and create R1. I have first find base and parents of P1 and construct P1. The same case might repeat. where parent of P1 is < base of P1. So, first I have to find a base which is ancestor of all the parents, then create a chain of deltas from that base to construct a given revision.

Output of --profile says that, what ever the decrease in the performance is caused only by this function which used for constructing the delta chain. So, I need to find a nicer algorithm to construct the delta chain. The current algorithm for constructing the delta chain is.

    def deltachain(self, rev):
        chain = []
        while self.base(rev) != rev:
            chain.append(rev)
            rev = self.deltaparent(rev)
        chain.reverse()
        return rev, chain

    def deltaparent(self, rev):
        if self.base(rev) == rev:
            return nullrev
        elif self.flags(rev) & REVLOG_PARENTDELTA_FLAGS:
            return self.parentrevs(rev)[0]
        else:
            return rev-1

CategoryHomepage