RFC: Managing Mercurial Repositories Remotely

Tue Feb 19 11:26:47 CST 2008

Here's a concrete attempt at implementing rclone and rdrop.
-peo

On Feb 19, 2008 12:57 PM, Peter Arrenbrecht <peter.arrenbrecht at gmail.com> wrote:
> Hi all,
>
> This is a somewhat lengthy proposal for an extension that supports
> running a limited set of hg commands server-side. Comments welcome.
>
>
> = Managing Mercurial Repositories Remotely =
>
> The idea is to give committers an easy way to create and manage
> server-side clones, for example to create branches for collaboration,
> or just for web-based review. On servers where committers are granted
> ssh access, this is a moot point. The target here is to provide such
> management features over the http(s) protocol. Ideally, the command
> set would be accessible both in the hg web ui (hg serve), as well as
> to scripts.
>
> hgfront is an attempt to provide these features by way of a full-blown
> web ui. However, it aims at also providing issue tracking and a wiki.
> This is far too heavy-weight for what I have in mind. And it does not
> lend itself well to scripting, I think.
>
> I propose a pair of matching hg extensions for the client to submit
> remote commands, and for the server to run them. The server extension
> could also expose the commands in the web ui, but that is not my
> current focus. Currently, I assume that the repo server is serving a
> dynamic collection so that new clones are made available
> automatically. The client extension is not strictly necessary, of
> course, as you could also script using wget. But it is much nicer.
>
> A key point must be the security of the scheme. Above the usual
> safeguards of requiring https and denying all rights by default, it
> should not be possible to submit arbitrary code for execution, even if
> properly authenticated. If this were not a requirement, then you might
> as well grant ssh access.
>
> == hg rdo ==
>
> My first naive attempt was to let users submit arbitrary hg commands
> to the server. This is hard to secure. Here's an example of cloning
> main to temp and applying an mq queue to it (assuming we're in a repo
> with a versioned patch queue and the server-side working dir is set to
> the target repo's parent dir):
>
>         hg rdo https://host.org/hg/main \
>                 "clone main temp" \
>                 "-R temp qinit -c"
>         hg -R .hg/patches push https://host.org/hg/temp/.hg/patches
>         hg rdo https://host.org/hg/temp \
>                 "-R temp/.hg/patches update" \
>                 "-R temp qpush -a"
>
> I see multiple attack vectors here:
>
>         * Use the --config option to directly enable arbitrary hooks and extensions.
>         * Clone a repo to another repo's .hg folder, then update, thus
> overwriting the other repo's configuration. Would allow you to plant
> arbitrary hooks and extensions.
>         * Clone and update a repo to a system path, thus compromising any
> data writable by the repo server's account.
>         * Use planted repos and `hg diff` to read any file readable by the
> repo server's account.
>
> So even just clone and update, two key commands, open up serious holes.
>
> == hg rclone et al. ==
>
> So let's try dedicated remote commands (assuming we're in a repo with
> a versioned patch queue):
>
>         hg rclone https://host.org/hg/main temp -r `hg id -i -r qparent`
>         hg rqinit https://host.org/hg/temp
>         hg -R .hg/patches push https://host.org/hg/temp/.hg/patches
>         hg rupdate https://host.org/hg/temp/.hg/patches
>         hg rqpush https://host.org/hg/temp -a
>
> and later on
>
>         hg rdrop https://host.org/hg/temp
>
> `hg rclone SRC TGT [-r REV]` clones a remote repo SRC to a sibling
> repo in the same path as the cloned repo. TGT is checked to not
> contain a path character. This ensures clones are always in a defined,
> non-problematic location. rclone does not update the new clone. rclone
> is only allowed if SRC contains a SRC/.hg/hgrc-rclone file. If so,
> this file is symlinked to TGT/.hg/hgrc and TGT/.hg/hgrc-rclone. (If
> the source files were already symlinks, they are first traced back to
> their origin and the new symlinks point there. This ensures all
> symlinks point to the central location where you manage the config
> files.) The same is done for .hg/hgrc-rqinit
>
> `hg rqinit TGT` creates a versioned patch queue in the target
> repository. rqinit is only allowed if TGT/.hg/hgrc-rclone exists. If
> so, it is symlinked (as above) to TGT/.hg/patches/.hg/hgrc, unless
> TGT/.hg/hgrc-rqinit exists. If the latter exists, it is used instead.
> You could automate the reapplication of the patch queue by adding
> hooks to this file.
>
> `hg rupdate TGT [-r REV]` updates the target repo to the desired version.
>
> `hg rqgo TGT patch` moves the remote repo's patch queue head to the
> designated patch.
>
> `hg rqselect TGT [SELECTOR]...` selects patch guards in the remote repo.
>
> `hg rdrop TGT` deletes the target repo in its entirety. You should not
> have web.allow_rdrop in your main repo's .hg/hgrc, but in its
> .hg/hgrc-rclone.
>
> I propose implementing this by way of a protocol extension called
> `rdo`, which uses a separate server-side command map to configure
> available server-side commands. These will _never_ be the same as the
> normal hg commands as they need to do more stringent input validation.
> For the sake of consistency we should probably implement this for ssh
> repos too.
>
> Once this support is in place, we might add other commands:
>
> `hg rlog TGT [OPTS]...` (as was, I think, desired by TortoiseHG). OPTS
> would only be the set of options pertaining to `hg log` as such, but
> none of the global options like --config.
>
> This approach has a number of benefits:
>
>         * Fairly simple setup. Just configure the necessary extensions on
> client and relevant server repos.
>         * Separate configuration for main repo and clones. Separate config
> for patch queues possible but not required.
>         * Separate or central configuration for different main repos possible.
>         * Central location of server-side config possible (by using symlinks
> and tracing origin).
>         * No need to interact with hgwebdir_mod.py. Can reuse existing
> authentication methods in hgweb_mod.py.
>
> Problems:
>
>         * No symlinks on Windows.
>         * Security hinges crucially on recognizing path components in the
> target repo name in rclone. In particular, this will have to handle
> UTF encodings and .. and . properly. Also, on Windows, it might be
> necessary to forbid both / and \ in addition to :.
>
>
> Again, comments are welcome. I shall meanwhile try to hack together a
> prototype based on my rdo prototype.
> -peo
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rexec.py
Type: text/x-python
Size: 1653 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20080219/c4b38b99/attachment.py 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rexecweb.py
Type: text/x-python
Size: 6548 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20080219/c4b38b99/attachment-0001.py