[PATCH 2 of 2 V4] hgweb: teach archive how to handle file patterns

Mads Kiilerich mads at kiilerich.com
Wed Feb 13 19:30:16 CST 2013


Angel Ezquerra wrote, On 02/10/2013 11:56 AM:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra at gmail.com>
> # Date 1360493525 -3600
> # Node ID fb655ad16f6675265da9d472ded7140a223fb283
> # Parent  be3e96a41d0f4b7a1f1dd443f5261d6eeb66626a
> hgweb: teach archive how to handle file patterns
>
> The archive web command now takes into account the "file" request entry, if one
> is provided.
>
> The provided "file" is processed as a "path" pattern by default, which makes it
> easy to only archive a certain file or directory. However, it is possible to
> specify a different type of pattern, such as relglob by specifying it
> explicitly on the query URL. Note that only "safe" patterns are allowed. Safe
> patterns are 'path', 'relpath', 'glog' and 'relglob'. Other pattern types are
> not allowed because they could be expensive to calculate.
>
> With this change hgweb can to process requests such as:
>
> 1. http://mercurial.selenic.com/hg/archive/tip.zip/mercurial/templates
>
>      This will download all files on the mercurial/templates directory as a zip
>      file
>
> 2. http://mercurial.selenic.com/hg/archive/tip.tar.gz/relglob:*.py
>
>      This will download all *.py files in the repository into a tar.gz file.
>
> An so forth.
>
> Note that this is a first step to add support for downloading directories from
> the web interface. Currently the only way to use this feature is by manually
> constructing the URL that you want to download. We will have to modify the
> archiveentry map entry on the different templates so that it adds the current
> folder path to the archive links.
>
> This revision also adds a two tests for this feature to test-archive.t. The
> first tests the selective archive feature and the second tests that the server
> rejects "unsafe" patterns.
>
> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py
> +++ b/mercurial/hgweb/webcommands.py
> @@ -803,6 +803,17 @@
>       if cnode == key or key == 'tip':
>           arch_version = short(cnode)
>       name = "%s-%s" % (reponame, arch_version)
> +
> +    ctx = webutil.changectx(web.repo, req)
> +    pats = []
> +    file = req.form.get('file', None)
> +    defaultpat = 'path'
> +    if file:
> +        pats = [req.form['file'][0]]
> +        if not scmutil.patsaresafe(pats, defaultpat):
> +            msg = 'Archive pattern not allowed: %s' % pats[0]
> +            raise ErrorResponse(HTTP_FORBIDDEN, msg)
> +
>       mimetype, artype, extension, encoding = web.archive_specs[type_]
>       headers = [
>           ('Content-Disposition', 'attachment; filename=%s%s' % (name, extension))
> @@ -812,9 +823,9 @@
>       req.headers.extend(headers)
>       req.respond(HTTP_OK, mimetype)
>   
> -    ctx = webutil.changectx(web.repo, req)
> +    matchfn = scmutil.match(ctx, pats, default=defaultpat)
>       archival.archive(web.repo, req, cnode, artype, prefix=name,
> -                     matchfn=scmutil.match(ctx, []),
> +                     matchfn=matchfn,
>                        subrepos=web.configbool("web", "archivesubrepos"))
>       return []
>   
> diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
> --- a/mercurial/scmutil.py
> +++ b/mercurial/scmutil.py
> @@ -682,6 +682,15 @@
>   
>       return l
>   
> +def patsaresafe(pats, defaultpattype):
> +    for pat in pats:
> +        pattype = defaultpattype
> +        if ':' in pat:
> +            pattype = pat.split(':')[0]
> +        if pattype.lower() not in ('path', 'relpath', 'glog', 'relglob'):

(btw: relpath and relglob are completely undocumented in the patterns help.)

'glog' seems to be a typo. That indicates that the feature doesn't have 
good test coverage and also haven't been fully tested manually.

But both kinds of globs are in in my opinion not sufficiently safe. 
Consider for example the execution time for
   hg locate "glob:*********************x"
and how something like that can be used to denial of service attacks in 
hgweb.


I must say that I am no big fan of this feature as it is.
* It is conceptually too complex compared to the value it adds.
* Patterns can not be made explorable in hgweb and it is undocumented 
and there is no good place to document it.
* URLs thus has to be constructed manually ... and it is not obvious how 
to encode for instance globs with '?' in a url.
* It doesn't have the full power of specifying multiple patterns with -X 
and -I as we are used to when using patterns.
* It is not obvious which subset of patterns that can be used.

If something in this area is needed then I would suggest focusing on 
just making it possible to download a single directory as tar file. 
There is no need for a pattern - we only need a path after the archive, 
for instance .../archive/REV.tar.bz2/sub/dir .

/Mads


More information about the Mercurial-devel mailing list