Review of bundle2: capabilities exchange

Gregory Szorc gregory.szorc at gmail.com
Fri Dec 19 07:43:47 UTC 2014


Here are my thoughts on bundle2 capabilities exchange.

b2x: prefix
-----------

It wasn't entirely obvious to me that the "b2x:" prefix really means 
"available part handler." (Or at least that's what I think it means.)

I'd really like to see some documentation around what goes in the b2 
capabilities dict. Some more formalization also couldn't hurt. e.g. why 
isn't there a prefix for stream level parameters?

undefined behavior with unknown part parameters
-----------------------------------------------

I love how the part type/handlers declare and advertise (via 
capabilities) their parameter names. This ensures peers are in a good 
position to make intelligent choices.

What isn't really well defined is how we're going to version things 
going forward. When a new parameter name is invented, are we going to 
guarantee backwards compatibility for all of eternity like we do now? 
Or, are we going to freeze each part type and invent a new part with the 
new metadata? Both have advantages and disadvantages. I'm a fan of 
creating new types to represent the different behaviors rather than 
trying to keep tacking things on forever. It does result in protocol 
bloat. But it keeps processing semantics well-defined and helps avoid 
ambiguities. I think this would also help with the mandatory/advisory foo.

mandatory and advisory parameters
---------------------------------

Remotes advertise the set of supported parameters on each part type. 
This means that any reasonably behaving client will *never* send a 
payload to the server that the server is incapable of processing. This 
means that mandatory and advisory flags have little to no meaning from 
client -> server.

This relegates mandatory and advisory flags to server -> client and "at 
rest" scenarios.

For server -> client, we don't currently advertise which parameters are 
mandatory vs advisory. At least not as far as I can tell: I couldn't 
find a single @parthandler with an upper case parameter name. For 
extensibility purposes (and this goes back to the previous section), I 
would think that we should have a more well-defined advertisement of 
what exactly the server is expecting with each part. Is a parameter 
mandatory? What's its default value if not?

I think it is important for servers to advertise this because it results 
in a better user experience. For example, a client may assume that a 
parameter is advisory and it may leave it out of the generated bundle2 
payload. However, the server may logically require the parameter (this 
is slightly different from how mandatory/advisory works today - we would 
throw an exception inside the @parthandler rather than inside the bit 
that parses the part headers). From the user's perspective, they just 
spent 5 minutes transferring a large payload to the server only to find 
out that a missing parameter was mandatory. That's not cool. Servers 
should advertise exactly what parameters they are expecting. 
Alternatively, we should have well-defined and frozen behaviors for each 
part type and we should version part types when new behavior is wanted. 
Again, see the above section.

lack of forward definition of parts and parameters
--------------------------------------------------

For at-rest bundles (bundles that are saved and read later - as opposed 
to bundles that are dynamically generated when capabilities are known 
exactly), we don't have the luxury of knowing exactly what the repo 
supports and thus can't avoid the potential of the repo encountering 
"unknown" data.

In this scenario, we could very well have a very large bundle and 
encounter an unknown part at the tail end. This is a bad user experience.

I'd like to see the ability for readers to avoid having to process an 
entire bundle before discovering they are incapable of fully processing 
it. Maybe we need the ability to forward declare parts (and possibly 
their parameters). Maybe we just need a reading mode that does an 
initial pass over the part types and parameters without processing.

This feels like a low priority to me. But I'd like to see it addressed 
because waiting minutes to discover an error is extremely painful.

server-required part types
--------------------------

Similar in vein to my request for servers to advertise which parameters 
are mandatory, I think servers should advertise which part types are 
mandatory. This opens the door to servers requiring clients to meet 
certain minimal requirements before pushing. e.g. servers may want to 
require a "GPG part" with signatures encapsulating the changes they are 
about to make. Or something more realistic, when obsolescence becomes 
more mainstream, we want to try to enforce the requirement that only 
obsolescence aware clients send us data. Having part type presence 
requirements is a way of establishing minimum client requirements very 
early in the push process, before the client has potentially wasted 
minutes sending data to the server.

I know this feature is somewhat against the strong backwards compatible 
nature of Mercurial. But as someone who consistently deals with problems 
reported by people who aren't using a modern and ideal Mercurial 
installation or configuration, I'd really like more levers to enforce 
best practices. (I still want to polish off my "best practices" 
extension so servers can inform clients on how to "comply" with "best 
practices.)


More information about the Mercurial-devel mailing list