summary of bundle2 discussion.
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Tue Jan 6 22:48:46 UTC 2015
Here is a summary of the bundle2 feedback and discussion. It is meant to
gather the question raised with a quite dump of my brain state about
each of them. This is not meant to spark a new discussion from this
email. My views are obviously not definitive, I expect each topic to be
rediscussed in its own thread in time (probably post 3.3 release). I may
have also gotten some people feedback wrong, I apologize in advance for
when it happened.
Capability exchange
-------------------
Greg expressed confusion about the current capability scheme. And
advocating for a more verbose form where all available parts and
parameters of each of them are listed.
I think that the current scheme (simple name+values express a whole set
of parts and parameters) is simpler and good enough. However, Greg's
initial confusion points to the need for real documentation of what is
to be expected from each capability. This should prevent incompatible
changes made by mistake in the future.
Mandatory vs advisory
---------------------
Greg's stance is that advisory parameters are useless as the client
should never send information the server can't process. This is
consistent with its position in favor of a full discovery.
Mike is concerned about protocol evolution that get mandatory and
advisory status to change over time.. He also request actual examples
for advisory parameters. (See email for details:
http://www.selenic.com/pipermail/mercurial-devel/2014-December/064966.html)
My view: One of the reasons for advisory parameters is to have less
complicated discovery of server//client capability and configuration.
There is a good deal of small information that are not critical and
could be included in all cases (eg: number of changesets in changegroup,
request for verbose output). Not having to make fine-grained discovery
here will simplify the code in multiple places. In my opinion, this
disagreement between Greg and I is an echo of the disagreement on the
capability discovery topic. Both of our positions appear consistent with
our respective stances on discovery.
I'm not too concerned about the protocol drifting pointed by Mike. The
mandatory/advisory status are not at a âprotocolâ level (part X must be
included in all bundles ever) but apply to a specific bundle
(part/parameter X must be processed to process this very bundle
properly) so we still have room to deprecate parts and make the
protocol evolve. We did a good jump in Mercurial with this approach so
far so I'm not too concerned.
The request for example is very valid and I'll try to address that.
The bundle2 wiki page already have some examples, but they are not very
good.
Pure data content vs process related content
--------------------------------------------
On list and off list discussion highlighted that bundle2 currently
carries two kinds of data.
- Actual "repository" data (changegroup, phase, bookmarks,â¦),
- Exchange related data (output, list of head for race detection, reply
capability, â¦)
These two kinds of data are carried in the same stream using bundle2-parts.
The distinction between these two kinds of data is becoming very clear.
But I still fail to see a major issue in mixing them in the same stream.
(I can maybe forsee some progress or output related headache) Having a
single mechanism to handle all data is definitely a win.
However, disregarding such core different early scream like a terrible
terrible mistake that will aunt us for year.
More thinking on this topic is required.
Framed protocol vs single stream + interrupt
--------------------------------------------
(this is a vast topic so this summary will likely mis-represent it)
Greg was advocating for using a multi-channel, framed protocol. This
more modern approach would handle parallel processing more elegantly.
(eg: that could handle the data/non-data distinction quite well). The
stream would be split into "frames" that can be read and processed in
one go. The frame processors would be responsible for sticking back
together "frame continuation" for parts that does not fit into a a
single frame.
While this seems a more modern and powerful approach, this is also a
significantly more complicated implementation, in particular when it
come to having a simple interface for part generation and consumption as
we have now. It also makes the whole bundle processing and scheduling
more complicated.
My current position is: I'm not sure the added complexity is worth the
benefit. I hope the current side-channel ability of the current format
should be enough to handle any multi-channel need in the future. If not,
it will be a time for a bundle3 implementation as I feel this new idea
are coming way too late (6 month+) in the process, I think we should
aims are stabilizing the current format instead of reworking the whole
stack.
Parameters encoding
-------------------
The stream level parameters are encoded using a mix of binary and text
encoding. The part level parameter are encoded using a pure binary
encoding with some weirdness.
Greg has pointed out this as inconsistent; I think this is a very valid
point and should be addressed.
Chunk size
----------
Greg is pointing that we should probably restrict the maximum size of
part chunk. This is to prevent allocation memory attack on small system.
Arguably, someone with push access to the system have a lot of other way
to DOS with a bundle content. However, avoiding protocol level attack
vectors seems very sensible.
Bundle2 for every commands
-----------------------------
The question of using bundle2 for all wire operation, (including read
only operation) was raised. This is a valid question with basically two
answers:
- It did not happened because nobody wrote any code for it.
- We cannot use bundle2 for read-only request (the client â server flux)
because some http server will reject POST request for un-authenticated user.
Endianess
---------
Greg argued for moving to little endian encoding.
I think we should stick to network-endian (big-endian) for consistency
with the rest of the world.
Part size
---------
Greg pointed that parts tend to be very large (eg: changegroup) and
could much smaller if we had more specialized code.
I think this is a valid point but nobody wrote any code in that
direction yet.
--
Pierre-Yves David
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list