Wiki: WireProtocol

Matt Mackall mpm at selenic.com
Tue Sep 6 16:21:34 CDT 2005


On Tue, Sep 06, 2005 at 01:43:51PM -0700, Eric M. Hopper wrote:
> On Tue, 2005-09-06 at 11:06 -0700, Matt Mackall wrote:
> > Then the server hands back undefined garbage.
> 
> *grin*  Oh, good.  I was hoping it didn't crash when you did that.  :-)
> 
> > > Do you walk from the base to the tip when
> > > returning elements?
> > 
> > No, we walk from tip to base. Between is defined to operate on a
> > section of history with no merges, so each element has exactly one
> > parent. So we simply walk tip->p1->p1->...->base. Then we return
> > elements 1, 2, 4, 8,.. of the resulting list.
> > 
> > > If you don't, how is a search accomplished?  Since
> > > presumably it's the tip that won't be known, you'll always return the
> > > same pattern of elements no matter what the client specifies as the
> > > base.
> > 
> > Huh?
> 
> Presumably the client is searching for the first changeset in the line
> that it doesn't possess.  If it worked as you described....
> 
> tip->c1->c2->c3->c4->c5->c6->c7->c8->base
> 
> The client would then get c1, c2, c4 and c8 and would see that it didn't
> have c4, but did have c8.  So then (though I realize now that I'm making
> an error) the client would ask for tip->c8 instead of tip->base, and it
> would again get c1, c2, and c4, and not have any more information than
> when before.
> 
> But, I bet what really happens (and I was too stupid to see
> previously  :-) is the the client then asks for c4->c8, and it would
> then get c5, c6, and if didn't have either c5 or c6, it would ask for
> c6->c8, then discover it did have c7.

Yep. It's a sort of open-ended bisection search.
 
> > What request batching can be done is done internally. There's not a
> > lot of opportunity for parallelism here.
> 
> I can see some opportunities, but they aren't major.  Mostly you could
> make some of the initial 'figure out what the roots are' requests more
> fine grained so you could interleave searches along all the different
> branches at the same time.

What currently happens is that the client batches a bunch of of the
branches and between queries into single requests.

> And I wouldn't mean that the server had to do that in parallel on the
> remote side.  The goal would be to get the server doing some work while
> the next request was coming, and have stuff for the client to munch on
> while it was waiting for the response to the last request it sent.
> 
> But that probably wouldn't help in most situations.  It only helps when
> you have lots of parallel lines of changesets that haven't been fetched.
> And I suspect the effects would like be minor even then.

The tree-walking expense on either side is fairly minimal. What hurts
is:

- opening a new connection
- loading the CGI script
- reading the changeset index
- per-request network round-trip

So we generally want to do as much batching of requests as possible.

> > A link hash is the hash of the associated changeset. Nodes always have
> > a second parent, though usually it's the nullid. All the groups are in
> > the same format, over the wire and on disk. The only differences are
> > what Mercurial uses them for - a changelog is different from a filelog.
> 
> Is there a wiki page that describes how changelogs, filelogs and
> manifests all link up?

There's mercurial/notes.txt which is a start, and there's a wiki page
called Design someone put up which attempts to make it more readable.

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial mailing list