New bundle format; code structure

Mon May 23 13:24:22 CDT 2011

On Mon, 2011-05-23 at 16:16 +0200, Sune Foldager wrote:
> I'm currently looking at and thinking about the new bundle format. Not just the
> format, but also how we'll implement it in the code. The current state of
> affairs is a slight mess, as I wrote a mail about recently: sometimes full
> bundles are used, often headerless ones. Sometimes there is compression,
> sometimes not etc. I have tried to sum it up in the tables below:
> 
> Push:
> 
> proto    full bundle?    compression
> http     yes             any (capability), unchunked
> ssh      no              none, (streamed in 4k chunks)
> 
> Pull:
> 
> proto    full bundle?    compression
> http     no              ext. zlib (4k chunks, 64k for decomp.)
> ssh      no              none
> 
> Here "ext. zlib" means external zlib, as in: not being done by the bundler
> (in general, not much bundle-related IS done by the two bundler classes..
> they are a bit strange, really :p).

I think we're actually now at the point where ssh/http will _accept_ any
type of bundle. But we're conservative about what we send for
compatibility.

> Since we're introducing a new argument for getbundle anyway, wouldn't it be
> nicer if we could work towards something like this for pull:
> 
> Pull ng:
> 
> proto    full bundle?    compression
> http     yes             any (capability/server's choice, chunked)
> ssh      yes             any (capability/server's choice, chunked)

Sure.

> Default could then be zlib for http, none for ssh. Of course this will require
> shuffling some code around, possibly enhancing the capabilities of the bundling
> code and such, to ensure we can still do efficient chunked compressions and
> decompressions. I have a vague idea in mind I could make more concrete.
> 
> For push, we need a new unbundle command (like getbundle; sendbundle?), and we
> could perhaps "fix" ssh in that case as well, so we can unify it with http.

putbundle?

> Also, a draft of a new bundle format is here:
> http://mercurial.selenic.com/wiki/BundleFormat2

Thinking ahead a bit, we're going to want a way to specify 256-bit
hashes for whatever we replace SHA1 with.

> It's rough, and Benoit had some comments (on irc), which I forgot. I tried to
> include the following features:
> 
> - support for capabilities for future enhancements

Let's get our nomenclature right here:

- a capability is something a client can choose to support or can ignore
- a requirement is something a client must support, or abort

Note the asymmetry here: our repos have requirements but not
capabilities (this repo requires at least these features to be safely
used) and our servers have capabilities but not requirements (we're
happy to talk to anyone, just tell us what you want).

For the bundle format, it's obviously important that a client be able to
distinguish which is which. So we either want one list (requirements
ONLY), two lists, or a way to flag entries in one list.

For comparison, PNG does a clever thing where each chunk has a feature
name with embedded flags based on capitalization indicating whether it's
a capability or requirement:

https://secure.wikimedia.org/wikipedia/en/wiki/Portable_Network_Graphics#.22Chunks.22_within_the_file

An IDAT chunk is the image data, obviously supporting it is not optional
(a requirement). But a gAMA chunk with gamma settings is optional (a
capability). They've got two other flag bits here, public/private, and
pass-through. Here, non-gAMA-aware clients are instructed not to copy
the gAMA chunk if they manipulate a non-optional chunk like IDAT. But
they're free to copy iTXt (comment) chunks. We don't really have the
pass-through issue, but I think this design is pretty instructional.

One can imagine doing something similar with our bundle chunks. For
instance, the chunks for progress estimation are optional. We may have
other optional chunks in the future (like a bookmark chunk). 

-- 
Mathematics is the supreme nostalgia of our time.