summary of bundle2 discussion.

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Jan 6 22:48:46 UTC 2015


Here is a summary of the bundle2 feedback and discussion. It is meant to 
gather the question raised with a quite dump of my brain state about 
each of them. This is not meant to spark a new discussion from this 
email. My views are obviously not definitive, I expect each topic to be 
rediscussed in its own thread in time (probably post 3.3 release). I may 
have also gotten some people feedback wrong, I apologize in advance for 
when it happened.

Capability exchange
-------------------

Greg expressed confusion about the current capability scheme. And 
advocating for a more verbose form where all available parts and 
parameters of each of them are listed.

I think that the current scheme (simple name+values express a whole set 
of parts and parameters) is simpler and good enough. However, Greg's 
initial confusion points to the need for real documentation of what is 
to be expected from each capability. This should prevent incompatible 
changes made by mistake in the future.


Mandatory vs advisory
---------------------

Greg's stance is that advisory parameters are useless as the client 
should never send information the server can't process. This is 
consistent with its position in favor of a full discovery.

Mike is concerned about protocol evolution that get mandatory and 
advisory status to change over time.. He also request actual examples 
for advisory parameters. (See email for details: 
http://www.selenic.com/pipermail/mercurial-devel/2014-December/064966.html)

My view: One of the reasons for advisory parameters is to have less 
complicated discovery of server//client capability and configuration. 
There is a good deal of small information that are not critical and 
could be included in all cases (eg: number of changesets in changegroup, 
request for verbose output). Not having to make fine-grained discovery 
here will simplify the code in multiple places. In my opinion, this 
disagreement between Greg and I  is an echo of the disagreement on the 
capability discovery topic. Both of our positions appear consistent with 
our respective stances on discovery.

I'm not too concerned about the protocol drifting pointed by Mike. The 
mandatory/advisory status are not at a “protocol” level (part X must be 
included in all bundles ever) but apply to a specific bundle 
(part/parameter X must be processed to process this very bundle 
properly)  so we still have room to deprecate parts and make the 
protocol evolve. We did a good jump in Mercurial with this approach so 
far so I'm not too concerned.
  The request for example is very valid and I'll try to address that. 
The bundle2 wiki page already have some examples, but they are not very 
good.


Pure data content vs process related content
--------------------------------------------

On list and off list discussion highlighted that bundle2 currently 
carries two kinds of data.

- Actual "repository" data (changegroup, phase, bookmarks,…),
- Exchange related data (output, list of head for race detection, reply 
capability, …)

These two kinds of data are carried in the same stream using bundle2-parts.


The distinction between these two kinds of data is becoming very clear. 
But I still fail to see a major issue in mixing them in the same stream. 
(I can maybe forsee some progress or output related headache) Having a 
single mechanism to handle all data is definitely a win.

However, disregarding such core different early scream like a terrible 
terrible mistake that will aunt us for year.

More thinking on this topic is required.


Framed protocol vs single stream + interrupt
--------------------------------------------

(this is a vast topic so this summary will likely mis-represent it)

Greg was advocating for using a multi-channel, framed protocol. This 
more modern approach would handle parallel processing more elegantly. 
(eg: that could handle the data/non-data distinction quite well). The 
stream would be split into "frames" that can be read and processed in 
one go. The frame processors would be responsible for sticking back 
together "frame continuation" for parts that does not fit into a a 
single frame.

While this seems a more modern and powerful approach, this is also a 
significantly more complicated implementation, in particular when it 
come to having a simple interface for part generation and consumption as 
we have now. It also makes the whole bundle processing and scheduling 
more complicated.

My current position is: I'm not sure the added complexity is worth the 
benefit. I hope the current side-channel ability of the current format 
should be enough to handle any multi-channel need in the future. If not, 
it will be a time for a bundle3 implementation as I feel this new idea 
are coming way too late (6 month+) in the process, I think we should 
aims are stabilizing the current format instead of reworking the whole 
stack.


Parameters encoding
-------------------

The stream level parameters are encoded using a mix of binary and text 
encoding. The part level parameter are encoded using a pure binary 
encoding with some weirdness.

Greg has pointed out this as inconsistent; I think this is a very valid 
point and should be addressed.


Chunk size
----------

Greg is pointing that we should probably restrict the maximum size of 
part chunk. This is to prevent allocation memory attack on small system.

Arguably, someone with push access to the system have a lot of other way 
to DOS with a bundle content. However, avoiding protocol level attack 
vectors seems very sensible.


Bundle2 for every commands
-----------------------------

The question of using bundle2 for all wire operation, (including read 
only operation) was raised. This is a valid question with basically two 
answers:

- It did not happened because nobody wrote any code for it.
- We cannot use bundle2 for read-only request (the client → server flux) 
because some http server will reject POST request for un-authenticated user.


Endianess
---------

Greg argued for moving to little endian encoding.

I think we should stick to network-endian (big-endian) for consistency 
with the rest of the world.


Part size
---------

Greg pointed that parts tend to be very large (eg: changegroup) and 
could much smaller if we had more specialized code.

I think this is a valid point but nobody wrote any code in that 
direction yet.

-- 
Pierre-Yves David

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list