[PATCH STABLE] help: machines help topic

Matt Mackall mpm at selenic.com
Mon Jul 20 16:52:31 CDT 2015


On Sat, 2015-07-18 at 17:12 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1437264628 25200
> #      Sat Jul 18 17:10:28 2015 -0700
> # Node ID 28b1e76a85fb542e0fdfb1f383733192c83f14ca
> # Parent  eabba9c75061254ff62827f92df0f32491c74b3d
> help: machines help topic
> 
> There are a lot of non-human consumers of Mercurial. And the challenges
> and considerations for machines consuming Mercurial is significantly
> different from what humans face.
> 
> I think there are enough special considerations around how machines
> consume Mercurial that a dedicated help topic is warranted. I concede
> the audience for this topic is probably small compared to the general
> audience. However, lots of normal Mercurial users do things like create
> one-off shell scripts for common workflows that I think this is useful
> enough to be in the install (as opposed to, say, a wiki page - which
> most users will likely never find).
> 
> This text is by no means perfect. But you have to start somewhere. I
> think I did cover the important parts, though.
> 
> diff --git a/mercurial/help.py b/mercurial/help.py
> --- a/mercurial/help.py
> +++ b/mercurial/help.py
> @@ -165,8 +165,9 @@ helptable = sorted([
>      (["glossary"], _("Glossary"), loaddoc('glossary')),
>      (["hgignore", "ignore"], _("Syntax for Mercurial Ignore Files"),
>       loaddoc('hgignore')),
>      (["phases"], _("Working with Phases"), loaddoc('phases')),
> +    (['machines'], _('Mercurial for Machines'), loaddoc('machines')),

I like the general idea of this, but I agree this topic should be called
scripting.

>  ])
>  
>  # Map topics to lists of callable taking the current topic help and
>  # returning the updated version
> diff --git a/mercurial/help/machines.txt b/mercurial/help/machines.txt
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/help/machines.txt
> @@ -0,0 +1,131 @@
> +It is common for machines (as opposed to humans) to consume Mercurial.
> +This help topic describes some of the considerations for interfacing
> +machines with Mercurial.
> +
> +Choosing an Interface
> +=====================
> +
> +Machines have a choice of several methods to interface with Mercurial.
> +These include:
> +
> +- Executing the ``hg`` process
> +- Querying a HTTP server
> +- Calling out to a command server
> +- Custom extensions
> +
> +Executing ``hg`` processes is very similar to how humans interact with
> +Mercurial in the shell. It should already be familar to you.
> +
> +:hg:`serve` can be used to start a server. By default, this will start
> +a "hgweb" HTTP server. This HTTP server has support for machine-readable

"an"

> +output, such as JSON. For more, see :hg:`help hgweb`.
> +
> +:hg:`serve` can also start a "command server." Clients can connect
> +to this server and issue Mercurial commands over a special protocol.
> +For more details on the command server, including links to client
> +libraries, see https://mercurial.selenic.com/wiki/CommandServer.
> +
> +For advanced use cases, you can also implement custom extensions and
> +have machines interface with those. Extensions can implement custom
> +commands, revsets, templates, etc. They can also change behavior of
> +existing Mercurial commands. Implementing extensions is beyond the
> +scope of this help topic.

..and has serious licensing and stability implications! Let's not even
mention this here.

> +:hg:`serve` based interfaces (the hgweb and command servers) have the
> +advantage over simple ``hg`` process invocations in that they are
> +likely more efficient. This is because there is significant overhead
> +to spawn new Python processes.
> +
> +.. tip::
> +
> +   If you need to invoke several ``hg`` processes in short order and/or
> +   performance is important to you, use of a server-based interface
> +   is highly recommended.
> +
> +Environment Variables
> +=====================
> +
> +As documented in :hg:`help environment`, various environment variables
> +influence the operation of Mercurial. The following are particularly
> +relevant for machines consuming Mercurial:
> +
> +HGPLAIN
> +    If not set, Mercurial's output could be influenced by configuration
> +    settings that impact its encoding, verbose mode, localization, etc.
> +
> +    It is highly recommended for machines to set this variable when
> +    invoking ``hg`` processes.
> +
> +HGRCPATH
> +    If not set, Mercurial will inherit config options from config files
> +    using the process described in :hg:`help config`. This includes
> +    inheriting user or system-wide config files.
> +
> +    Because config files can alter the behavior of Mercurial, not
> +    setting HGRCPATH to a file with a known acceptable configuration
> +    could lead to unwanted or inconsistent operation.
> +
> +    The value of HGRCPATH can be set to an empty file or the null device
> +    (often ``/dev/null``) to bypass loading of the user and system
> +    config files.

Scripts shouldn't be clearing this, as Augie mentioned. Not only do you
lose potentially crucial extensions that the tool author hasn't
considered/heard of, you also lose username, paths, auth info..

> +HGENCODING
> +   If not set, the locale used by Mercurial will be detected from the
> +   environment.
> +
> +   Explcitly setting this environment variable (commonly to "utf-8")
> +   is a good practice to guarantee consistent results.
> +
> +Consuming Command Output
> +========================
> +
> +It is common for machines to need to parse the output of Mercurial
> +commands for relevant data.
> +
> +If you need to parse Mercurial command output, the recommended solution
> +to this problem is to avoid it if possible.

Disagree. >99% of automated entities interfacing to Mercurial will be
dead-simple and stupid shell scripts, aliases, or adhoc pipelines. And
this is as it should be.

>  Existing libraries for the
> +Mercurial command server have already solved large parts of this
> +problem. These libraries will enable you to call a function and receive
> +a rich data structure with the results: no parsing necessary.
> +
> +If you still need to parse command output, the recommended method
> +to do that is to specify the ``-T/--template`` argument to the command
> +and ask Mercurial to emit output in a machine readable format, such as
> +JSON or XML. e.g. ``hg log -T json`` or ``hg log -T xml``. You should
> +be able to convert Mercurial's stdout into a data structure from your
> +program without writing any custom parsing code.

I can't endorse this advice either.

Reminder: JSON and XML are a disaster for binary / mixed / unspecified
encoding data that can be produced by
manifest/cat/diff/annotate/log/everything. The traditional Unix shell
tool stream-of-bytes model is way more robust here and like it or not,
is what Mercurial is designed to be.

Windows is not friends with UTF-8 and Mercurial is not friends with
UTF-16, so this whole approach is a big problem for non-ASCII filenames
on Windows.

Also doing anything with JSON or XML in our aforementioned 99% case is a
headache compared to, say, hg status -n.

> +If for whatever reason the default output of these pre-defined machine
> +readable templates is not sufficient, try explicitly definingscm: a
> +template to tailor output to your needs. See :hg:`help templates`.
> +
> +If templates still don't work for you, only then should you consider
> +parsing the command's default output. This is the least desirable
> +because not only does it mean you need to write code to parse output,
> +but also command output does not have as strong guarantees around
> +backwards compatibility: upgrading Mercurial could change command
> +output and break your machine consumer.

Also not our philosophy. The things you'd actually want to parse have
pretty strong guarantees, precisely to serve the aforementioned 99%
case. 

> +
> +.. note::
> +
> +   Not all commands support templatized output. It is, however, a goal
> +   of Mercurial that all commands eventually support templatized output.

In fact, very few do and mostly in an undocumented/experimental way at
present. That includes hg log -T json.

> +.. note::
> +
> +   Commands often have varying output verbosity, even when machine
> +   readable styles are being used. Adding ``-v/--verbose`` and
> +   ``--debug`` to the command arguments can increase the amount of
> +   data exposed by Mercurial.
> +
> +Benefits of "share" Extension
> +=============================
> +
> +Machines often need to manage clones and working copies of
> +repositories. The "share" extension provides functionality for sharing
> +repository data across several working copies. It can even "pool"
> +storage for logically related repositories.

I think it's preferable to have a section of 'see also's for things like
environment variables, config variables, relevant extensions, revsets,
filesets, templates...

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list