[PATCH 2 of 2 V4] hgweb: teach archive how to handle file patterns
Mads Kiilerich
mads at kiilerich.com
Wed Feb 13 19:30:16 CST 2013
Angel Ezquerra wrote, On 02/10/2013 11:56 AM:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra at gmail.com>
> # Date 1360493525 -3600
> # Node ID fb655ad16f6675265da9d472ded7140a223fb283
> # Parent be3e96a41d0f4b7a1f1dd443f5261d6eeb66626a
> hgweb: teach archive how to handle file patterns
>
> The archive web command now takes into account the "file" request entry, if one
> is provided.
>
> The provided "file" is processed as a "path" pattern by default, which makes it
> easy to only archive a certain file or directory. However, it is possible to
> specify a different type of pattern, such as relglob by specifying it
> explicitly on the query URL. Note that only "safe" patterns are allowed. Safe
> patterns are 'path', 'relpath', 'glog' and 'relglob'. Other pattern types are
> not allowed because they could be expensive to calculate.
>
> With this change hgweb can to process requests such as:
>
> 1. http://mercurial.selenic.com/hg/archive/tip.zip/mercurial/templates
>
> This will download all files on the mercurial/templates directory as a zip
> file
>
> 2. http://mercurial.selenic.com/hg/archive/tip.tar.gz/relglob:*.py
>
> This will download all *.py files in the repository into a tar.gz file.
>
> An so forth.
>
> Note that this is a first step to add support for downloading directories from
> the web interface. Currently the only way to use this feature is by manually
> constructing the URL that you want to download. We will have to modify the
> archiveentry map entry on the different templates so that it adds the current
> folder path to the archive links.
>
> This revision also adds a two tests for this feature to test-archive.t. The
> first tests the selective archive feature and the second tests that the server
> rejects "unsafe" patterns.
>
> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py
> +++ b/mercurial/hgweb/webcommands.py
> @@ -803,6 +803,17 @@
> if cnode == key or key == 'tip':
> arch_version = short(cnode)
> name = "%s-%s" % (reponame, arch_version)
> +
> + ctx = webutil.changectx(web.repo, req)
> + pats = []
> + file = req.form.get('file', None)
> + defaultpat = 'path'
> + if file:
> + pats = [req.form['file'][0]]
> + if not scmutil.patsaresafe(pats, defaultpat):
> + msg = 'Archive pattern not allowed: %s' % pats[0]
> + raise ErrorResponse(HTTP_FORBIDDEN, msg)
> +
> mimetype, artype, extension, encoding = web.archive_specs[type_]
> headers = [
> ('Content-Disposition', 'attachment; filename=%s%s' % (name, extension))
> @@ -812,9 +823,9 @@
> req.headers.extend(headers)
> req.respond(HTTP_OK, mimetype)
>
> - ctx = webutil.changectx(web.repo, req)
> + matchfn = scmutil.match(ctx, pats, default=defaultpat)
> archival.archive(web.repo, req, cnode, artype, prefix=name,
> - matchfn=scmutil.match(ctx, []),
> + matchfn=matchfn,
> subrepos=web.configbool("web", "archivesubrepos"))
> return []
>
> diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
> --- a/mercurial/scmutil.py
> +++ b/mercurial/scmutil.py
> @@ -682,6 +682,15 @@
>
> return l
>
> +def patsaresafe(pats, defaultpattype):
> + for pat in pats:
> + pattype = defaultpattype
> + if ':' in pat:
> + pattype = pat.split(':')[0]
> + if pattype.lower() not in ('path', 'relpath', 'glog', 'relglob'):
(btw: relpath and relglob are completely undocumented in the patterns help.)
'glog' seems to be a typo. That indicates that the feature doesn't have
good test coverage and also haven't been fully tested manually.
But both kinds of globs are in in my opinion not sufficiently safe.
Consider for example the execution time for
hg locate "glob:*********************x"
and how something like that can be used to denial of service attacks in
hgweb.
I must say that I am no big fan of this feature as it is.
* It is conceptually too complex compared to the value it adds.
* Patterns can not be made explorable in hgweb and it is undocumented
and there is no good place to document it.
* URLs thus has to be constructed manually ... and it is not obvious how
to encode for instance globs with '?' in a url.
* It doesn't have the full power of specifying multiple patterns with -X
and -I as we are used to when using patterns.
* It is not obvious which subset of patterns that can be used.
If something in this area is needed then I would suggest focusing on
just making it possible to download a single directory as tar file.
There is no need for a pattern - we only need a path after the archive,
for instance .../archive/REV.tar.bz2/sub/dir .
/Mads
More information about the Mercurial-devel
mailing list