[PATCH 3 of 7] dispatch: rework the serve --stdio safe argument checks

Mon Jul 16 06:48:21 EDT 2018

On Sun, 15 Jul 2018 12:33:48 +0200, Boris FELD wrote:
> The properties that are important for us are the following and we will 
> explain why a bit further:
>      - We want to activate it as soon as possible (ssh: dispatch level, 
> http: wsgi server),
>      - Controlled by the client,
>      - Security managed by core.
> 
> Yuya would like to move the implementation outside of core dispatch, for 
> security and maintenance concerns. Either, into custom ssh script, or at 
> the protocol level. Neither of these two locations fits our needs (and 
> constraints).
> 
> So we need to find a way to do this early extra validation of parameter 
> in a way that is good enough in terms of security and maintenance.
> 
> # Longer explanation
> 
> ## Why does it has to be set up so early?
> 
> A significant part of the slowness we observe happen very early, during 
> the initial setup (extensions setup, various data loading, custom 
> extensions code, etc...). So the earlier the extra debug/profile is 
> setup, the more useful data we gather.

I don't disagree with that, but I just doubt if that would be worth making
the dispatcher hack complicated and adding more variations of ssh entry
arguments. Honestly, I'm thinking that the use of "hg serve" as an entry
point was a design mistake which we can't fix, so I don't think it's good
idea to extend the syntax of the entry point.

> Doing this at the protocol level would be too late, as a lot of the 
> important time-consuming activities have already happened.
> 
> ## Why control it from the client instead of being a custom ssh script?
> 
> Some of the people we work with have a high level of security (eg: we 
> don't have a full shell access on their server and we only have access 
> to a subset of their repositories).
> 
> This means multiples things:
> 
> * We still want some security management, using `hg debugserve` would 
> allow arbitrary code to be executed, something we still want to prevent.
> 
> * Deploying a new ssh script is something complex that takes time. 
> Admins have to look at the new script and deploy it. It can take 
> multiple days, making us unable to collect these valuable data (in time).
> 
> * If we, mercurial core developers, are afraid of getting the validation 
> logic in core wrong, I don't think we should expect people not familiar 
> with Mercurial to be able to get that right. So moving the 
> responsibility to validate input to admins seems dangerous.
> 
> * Having an example script in `contrib/` would not help much as people 
> usually already have some custom script.

Why? If there's already a custom script installed, IIUC, only thing the
sysadmin will have to do is to replace 'hg serve' with
/path/to/contrib/hg-ssh-script, one time. After that, you can inject
commands understood by that script via sys.argv or stdin.

If the server has a robust pre-validation logic of 'hg serve' command,
or hg-ssh is used for example, argv injection wouldn't work anyway unless
the validation rule is relaxed or the existing script is adjusted to allow
extra command arguments.

> * Controlling option from the client is critical. When investigating 
> something, we try various options to look at the issue from different 
> angles (profile cpu vs time). Having to redeploy a new script every time 
> we want to change an option break our ability to investigate quickly as 
> it often requires synchronization between different teams.
> 
> ## Why do you want this into core
> 
> * Make it much simpler for us to use it at more place (including to help 
> random people on the list and bug tracker). Having to deploy a custom 
> Mercurial binaries on the server is a strong barrier.
> 
> * Given how valuable this is to us, we expect it to be valuable for 
> others. In fact server-side debug message is something various users 
> asked for in the past.

Well, server-side "debug" message could be handled at protocol level
unlike profiling of the early uisetup() thingy. That said, I'm slightly
interested in how ssh issues are debugged in various production environments
as there might be a better approach.