Review of bundle2: capabilities exchange

Tue Dec 23 15:56:26 CST 2014

On 12/19/14 5:29 PM, Pierre-Yves David wrote:
>
>
> On 12/18/2014 11:43 PM, Gregory Szorc wrote:
>> Here are my thoughts on bundle2 capabilities exchange.
>>
>> b2x: prefix
>> -----------
>>
>> It wasn't entirely obvious to me that the "b2x:" prefix really means
>> "available part handler." (Or at least that's what I think it means.)
>
> It does not means what you think it means. the 'b2x:' prefix is here
> because the protocol is experimental and we want to avoid collision in
> case we break backward compatibility.
>
> The content of the capability dict is not strictly aimed at part handler
> either. The goal is to informs about available feature. And it is
> currently maintly used to says "hey I've this new part handler
> available" but there will be case where a capability is just about some
> stream level thing, or a single capability informs about multiple part.
>
> The whole capability dict will probably be cleaned when we drop the
> experimental flag on bundle2.
>
>> I'd really like to see some documentation around what goes in the b2
>> capabilities dict. Some more formalization also couldn't hurt. e.g. why
>> isn't there a prefix for stream level parameters?
>
> I hope the previous paragraph address this question.
>
>> undefined behavior with unknown part parameters
>> -----------------------------------------------
>>
>> I love how the part type/handlers declare and advertise (via
>> capabilities) their parameter names. This ensures peers are in a good
>> position to make intelligent choices.
>
> (note that is not strictly what happen, but the result is the same)
>
>> What isn't really well defined is how we're going to version things
>> going forward. When a new parameter name is invented, are we going to
>> guarantee backwards compatibility for all of eternity like we do now?
>
> Yes, we'll maintain backward compatibility (or at least feature potency
> capability) for ever.
>
>> Or, are we going to freeze each part type and invent a new part with the
>> new metadata? Both have advantages and disadvantages. I'm a fan of
>> creating new types to represent the different behaviors rather than
>> trying to keep tacking things on forever.
>
> Both approach are not incompatible. We can alter existing part when it
> make sense and introduce new one deprecated the other when it is smarter.
>
>> It does result in protocol
>> bloat. But it keeps processing semantics well-defined and helps avoid
>> ambiguities. I think this would also help with the mandatory/advisory
>> foo.
>
> I do not get this last part.
>
>> mandatory and advisory parameters
>> ---------------------------------
>>
>> Remotes advertise the set of supported parameters on each part type.
>> This means that any reasonably behaving client will *never* send a
>> payload to the server that the server is incapable of processing. This
>> means that mandatory and advisory flags have little to no meaning from
>> client -> server.
>  >
>  > This relegates mandatory and advisory flags to server -> client and "at
>  > rest" scenarios.
>
> (so again, the advertising is not strict. This confusion will probably
> keep bluring things for the rest of this email)
>
> Advisory parameters make sens in a lot of case, and handling the
> discovery and conditional handling for all of them would be tedious and
> bloating. Here are quick example of advisory parameters that could be
> blindly added to a part:
>
>   - verbose: "please produce verbose output if possible"
>   - lenght of "data": Can be used to produce progress information, but
> non critical if not supported.
>
>> For server -> client, we don't currently advertise which parameters are
>> mandatory vs advisory.
>
> The client send its capability when requesting a bundle.
>
>> At least not as far as I can tell: I couldn't
>> find a single @parthandler with an upper case parameter name.
> Upercase is not involved in the mandatoryness of a parameter.
>
>> For
>> extensibility purposes (and this goes back to the previous section), I
>> would think that we should have a more well-defined advertisement of
>> what exactly the server is expecting with each part. Is a parameter
>> mandatory? What's its default value if not?
>
> I think the features + options approach to capability is good enough
> here. And have been mostly good enough for mercurial itself.
>
>> I think it is important for servers to advertise this because it results
>> in a better user experience. For example, a client may assume that a
>> parameter is advisory and it may leave it out of the generated bundle2
>> payload.
>
> We needs to have non-ambiguous discovery and non-buggy client. But his
> can be achieve with the current infrastructure.
>
>
>> However, the server may logically require the parameter (this
>> is slightly different from how mandatory/advisory works today - we would
>> throw an exception inside the @parthandler rather than inside the bit
>> that parses the part headers).
>
> If the parameter is logically needed, it should be mandatory. If the
> client generate buggy bundle, the user will have a bad experience anyway.
>
>
>> From the user's perspective, they just
>> spent 5 minutes transferring a large payload to the server only to find
>> out that a missing parameter was mandatory. That's not cool. Servers
>> should advertise exactly what parameters they are expecting.
>> Alternatively, we should have well-defined and frozen behaviors for each
>> part type and we should version part types when new behavior is wanted.
>> Again, see the above section.
>>
>> lack of forward definition of parts and parameters
>> --------------------------------------------------
>>
>> For at-rest bundles (bundles that are saved and read later - as opposed
>> to bundles that are dynamically generated when capabilities are known
>> exactly), we don't have the luxury of knowing exactly what the repo
>> supports and thus can't avoid the potential of the repo encountering
>> "unknown" data.
>>
>> In this scenario, we could very well have a very large bundle and
>> encounter an unknown part at the tail end. This is a bad user experience.
>>
>> I'd like to see the ability for readers to avoid having to process an
>> entire bundle before discovering they are incapable of fully processing
>> it. Maybe we need the ability to forward declare parts (and possibly
>> their parameters). Maybe we just need a reading mode that does an
>> initial pass over the part types and parameters without processing.
>>
>> This feels like a low priority to me. But I'd like to see it addressed
>> because waiting minutes to discover an error is extremely painful.
>
> This could be easily implemented with a part at the beginning of the
> bundle. Such part would list all the mandatory part and parameters in
> the bundle to allow early failure.
>
>> server-required part types
>> --------------------------
>>
>> Similar in vein to my request for servers to advertise which parameters
>> are mandatory, I think servers should advertise which part types are
>> mandatory. This opens the door to servers requiring clients to meet
>> certain minimal requirements before pushing. e.g. servers may want to
>> require a "GPG part" with signatures encapsulating the changes they are
>> about to make. Or something more realistic, when obsolescence becomes
>> more mainstream, we want to try to enforce the requirement that only
>> obsolescence aware clients send us data. Having part type presence
>> requirements is a way of establishing minimum client requirements very
>> early in the push process, before the client has potentially wasted
>> minutes sending data to the server.
>
> forward compatibility with future required is hard. However it is easy
> to arrange for the series to issue a special capability saying "I
> request client to send part X" and ensure it is the case at the
> beginning of the stream. Old client would then fail early.
>
>> I know this feature is somewhat against the strong backwards compatible
>> nature of Mercurial. But as someone who consistently deals with problems
>> reported by people who aren't using a modern and ideal Mercurial
>> installation or configuration, I'd really like more levers to enforce
>> best practices. (I still want to polish off my "best practices"
>> extension so servers can inform clients on how to "comply" with "best
>> practices.)
>
> Mercurial is commited to backward compatibility -by-default-. I find it
> perfectly reasonable for an organization to enforce people to upgrade.
>

I'm just gotten to bottom reply to the whole message.

We have 2 types of exchanges: offline and online. In the online scenario 
(client-server, not at-rest bundles), I strongly believe that if a 
bundle2 payload is generated that the peer cannot fully process, then 
we've failed. "capabilities" is essentially content negotiation. And a 
goal of content negotiation is to know what the other side can accept so 
you don't send something they won't be able to read.

If we have 2 peers exchanging capabilities, under no circumstances 
should we be sending parts the other side can't understand. This means:

* "capabilities" should include the set of which parts can be processed.
* "capabilities" should include the parameters accepted by the different 
parts *or* the set of parameters attached to a part must be immutable 
after that part is defined/finalized (to ensure clients don't need 
unrecognized part parameters)
* "advisory" and "mandatory" should be completely irrelevant for wire 
protocol usages
* Forward-defining parts is not an issue

This touches on a point from my 3rd thread reviewing bundle2: the 
blurred line between bundle2 as a data representation format and bundle2 
as a wire protocol. Both have separate concerns. But we're shoehorning 
them together. I'd really like to see us start differentiating online vs 
offline usage of this format. I think when we do that, we'll realize 
that things like capped frame size and mandatory/advisory foo only 
applies to online usage.