RFC: Managing Mercurial Repositories Remotely

Wed Feb 20 04:35:34 CST 2008

Hi Peter,

The below proposal is interesting as I'd love to see more control over
remote repositories.

I noticed you mention hgfront in there - in which you are correct in
saying is not a nice lightweight solution to the issue.  However I'd
love to talk further on your idea and how your idea can
a) be implemented and
b) work well with hgfront.

hgfront is a component in a much larger project I've been designing,
but it's the most critical and I'd love to get it to a stable stage
within the next 2-3 months.

Some of the more advanced features for managing mercurial in our
system are going to involve being able to queue events such as
cloning.  For example, on a slow network where a 100mb clone might
take an half an hour to do for example you don't want to hold the
users browser or system up - so hgfront will be able to queue these
events and fire them as processes outside the web interface.

On Feb 19, 2008 11:57 AM, Peter Arrenbrecht <peter.arrenbrecht at gmail.com> wrote:
> Hi all,
>
> This is a somewhat lengthy proposal for an extension that supports
> running a limited set of hg commands server-side. Comments welcome.
>
>
> = Managing Mercurial Repositories Remotely =
>
> The idea is to give committers an easy way to create and manage
> server-side clones, for example to create branches for collaboration,
> or just for web-based review. On servers where committers are granted
> ssh access, this is a moot point. The target here is to provide such
> management features over the http(s) protocol. Ideally, the command
> set would be accessible both in the hg web ui (hg serve), as well as
> to scripts.
>
> hgfront is an attempt to provide these features by way of a full-blown
> web ui. However, it aims at also providing issue tracking and a wiki.
> This is far too heavy-weight for what I have in mind. And it does not
> lend itself well to scripting, I think.
>
> I propose a pair of matching hg extensions for the client to submit
> remote commands, and for the server to run them. The server extension
> could also expose the commands in the web ui, but that is not my
> current focus. Currently, I assume that the repo server is serving a
> dynamic collection so that new clones are made available
> automatically. The client extension is not strictly necessary, of
> course, as you could also script using wget. But it is much nicer.
>
> A key point must be the security of the scheme. Above the usual
> safeguards of requiring https and denying all rights by default, it
> should not be possible to submit arbitrary code for execution, even if
> properly authenticated. If this were not a requirement, then you might
> as well grant ssh access.
>
> == hg rdo ==
>
> My first naive attempt was to let users submit arbitrary hg commands
> to the server. This is hard to secure. Here's an example of cloning
> main to temp and applying an mq queue to it (assuming we're in a repo
> with a versioned patch queue and the server-side working dir is set to
> the target repo's parent dir):
>
>         hg rdo https://host.org/hg/main \
>                 "clone main temp" \
>                 "-R temp qinit -c"
>         hg -R .hg/patches push https://host.org/hg/temp/.hg/patches
>         hg rdo https://host.org/hg/temp \
>                 "-R temp/.hg/patches update" \
>                 "-R temp qpush -a"
>
> I see multiple attack vectors here:
>
>         * Use the --config option to directly enable arbitrary hooks and extensions.
>         * Clone a repo to another repo's .hg folder, then update, thus
> overwriting the other repo's configuration. Would allow you to plant
> arbitrary hooks and extensions.
>         * Clone and update a repo to a system path, thus compromising any
> data writable by the repo server's account.
>         * Use planted repos and `hg diff` to read any file readable by the
> repo server's account.
>
> So even just clone and update, two key commands, open up serious holes.
>
> == hg rclone et al. ==
>
> So let's try dedicated remote commands (assuming we're in a repo with
> a versioned patch queue):
>
>         hg rclone https://host.org/hg/main temp -r `hg id -i -r qparent`
>         hg rqinit https://host.org/hg/temp
>         hg -R .hg/patches push https://host.org/hg/temp/.hg/patches
>         hg rupdate https://host.org/hg/temp/.hg/patches
>         hg rqpush https://host.org/hg/temp -a
>
> and later on
>
>         hg rdrop https://host.org/hg/temp
>
> `hg rclone SRC TGT [-r REV]` clones a remote repo SRC to a sibling
> repo in the same path as the cloned repo. TGT is checked to not
> contain a path character. This ensures clones are always in a defined,
> non-problematic location. rclone does not update the new clone. rclone
> is only allowed if SRC contains a SRC/.hg/hgrc-rclone file. If so,
> this file is symlinked to TGT/.hg/hgrc and TGT/.hg/hgrc-rclone. (If
> the source files were already symlinks, they are first traced back to
> their origin and the new symlinks point there. This ensures all
> symlinks point to the central location where you manage the config
> files.) The same is done for .hg/hgrc-rqinit
>
> `hg rqinit TGT` creates a versioned patch queue in the target
> repository. rqinit is only allowed if TGT/.hg/hgrc-rclone exists. If
> so, it is symlinked (as above) to TGT/.hg/patches/.hg/hgrc, unless
> TGT/.hg/hgrc-rqinit exists. If the latter exists, it is used instead.
> You could automate the reapplication of the patch queue by adding
> hooks to this file.
>
> `hg rupdate TGT [-r REV]` updates the target repo to the desired version.
>
> `hg rqgo TGT patch` moves the remote repo's patch queue head to the
> designated patch.
>
> `hg rqselect TGT [SELECTOR]...` selects patch guards in the remote repo.
>
> `hg rdrop TGT` deletes the target repo in its entirety. You should not
> have web.allow_rdrop in your main repo's .hg/hgrc, but in its
> .hg/hgrc-rclone.
>
> I propose implementing this by way of a protocol extension called
> `rdo`, which uses a separate server-side command map to configure
> available server-side commands. These will _never_ be the same as the
> normal hg commands as they need to do more stringent input validation.
> For the sake of consistency we should probably implement this for ssh
> repos too.
>
> Once this support is in place, we might add other commands:
>
> `hg rlog TGT [OPTS]...` (as was, I think, desired by TortoiseHG). OPTS
> would only be the set of options pertaining to `hg log` as such, but
> none of the global options like --config.
>
> This approach has a number of benefits:
>
>         * Fairly simple setup. Just configure the necessary extensions on
> client and relevant server repos.
>         * Separate configuration for main repo and clones. Separate config
> for patch queues possible but not required.
>         * Separate or central configuration for different main repos possible.
>         * Central location of server-side config possible (by using symlinks
> and tracing origin).
>         * No need to interact with hgwebdir_mod.py. Can reuse existing
> authentication methods in hgweb_mod.py.
>
> Problems:
>
>         * No symlinks on Windows.
>         * Security hinges crucially on recognizing path components in the
> target repo name in rclone. In particular, this will have to handle
> UTF encodings and .. and . properly. Also, on Windows, it might be
> necessary to forbid both / and \ in addition to :.
>
>
> Again, comments are welcome. I shall meanwhile try to hack together a
> prototype based on my rdo prototype.
> -peo
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
>

-- 
Tane Piper
Blog - http://digitalspaghetti.me.uk
Wii: 4734 3486 7149 1830

This email is: [ ] blogable [ x ] ask first [ ] private