[RFC] Importance of fork point in changeset discovery

Mon Aug 9 03:46:06 CDT 2010

Hi,
     I have been working retain the performance of new revlog, I was
thinking about possibility of cache method and carried away and ended up
thinking about DAG.

The current method looks tree of changesets from top. It analyzes each each
merge point from top to  the base. This method is causing so many round
trips to find out required changesets. But if we look from the tree from
bottom the point of interest become forkpoints and heads. Forkpoint is the
node which as more than 1 children. Considering a DAG in which each vertex
has at most two edges pointing to it, we can find difference between two
DAGs by analyzing forkponts and heads(Can we?). If someone makes a  change
in the repository either head changes or a forkpoint is created or both.

discovery protocol can (may) be:

1) Client send set of forkpoint along with degree and set of heads
2) Server finds out and send forkpoint which are not present in server and
client
3) Client analyzes server response and finds more information about changed
forkpoint (why?)[0]
4) Server calculates changesets[1]

[0] If a new forkpoint is created, its parents and their parents may not be
present on other side. So we have to analyze. (something like
parentforkpoint or defining level of forkpoints )

[1] Probably this take multiple steps like the current protocol, or may be
in single step.

Finding forkpoints is not that difficult. Ideally, it is a point which has
multiple entries in parents column in revlog index.

This is very vague and doesn't have any proof. I am writing this mail to
know if anyone worked in this direction and failed or can prove this doesn't
work, before I waste my time.

Thanks
-- Pradeep
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100809/fb9c46f7/attachment.htm>