[PATCH 1 of 3] hgweb: allow urlencoded forward slashes in specified revisions

Anton Shestakov av6 at dwimlabs.net
Mon Jul 13 09:25:12 CDT 2015


On Mon, 13 Jul 2015 22:41:03 +0900
Yuya Nishihara <yuya at tcha.org> wrote:

> On Mon, 13 Jul 2015 20:18:58 +0800, Anton Shestakov wrote:
> > # HG changeset patch
> > # User Anton Shestakov <av6 at dwimlabs.net>
> > # Date 1436688417 -28800
> > #      Sun Jul 12 16:06:57 2015 +0800
> > # Node ID 4b2713531ee8955fd282ae77cedfc3308c7fa98a
> > # Parent  648323f41a89619d9eeeb7287213378c340866c8
> > hgweb: allow urlencoded forward slashes in specified revisions
> > 
> > It's possible to have a branch/tag/bookmark with all kinds of
> > special characters, such as {}/\!?. While not very conveniently,
> > symbolic revisions with such characters work from command line if
> > user correctly quotes the characters. These characters also work in
> > hgweb, when they are properly encoded, with one exception:
> > '/' (forward slash, urlencoded as '%2F'), which was getting decoded
> > before hgweb could parse it as a part of PATH_INFO. Because of
> > that, hgweb was seeing it as any other forward slash, that is, as
> > just another url parts separator.
> > 
> > For example, if user wanted to see the content of dir/file at
> > bookmark 'feature/eggs', url could be:
> > '/file/feature%2Feggs/dir/file'. But hgweb tried to find a revision
> > 'feature' and get contents of 'eggs/dir/file'.
> > 
> > To fix this, let's inspect the contents of REQUEST_URI (which, if
> > present, contains "raw" url), find '%2F' in revision identifier and
> > then re-join parts of PATH_INFO that were split on decoded '%2F'
> > characters (as opposed to real forward slashes).
> 
> It's a source of trouble to escape '/' in path component as '%2F'.
> For example, if an hgweb is behind an NGINX reverse proxy, we can't
> see the original encoded URI but the normalized version.

Huh. Indeed, it didn't work on my nginx+gunicorn setup. Although I
don't think it's because of nginx - rather it's the WSGI server that
parses the raw url. In case of gunicorn, it doesn't pass REQUEST_URI,
but instead RAW_URI (which does contain raw %2F).

So option number one is to also try req.env['RAW_URI'].

> If we want to support "foo/bar" tags, I think we have to introduce a
> different encoding scheme or ?query= syntax.

?query= (or rather ?node=) syntax would work, but it would require
rewriting every place that passes revision context to another page (so
the majority of hgweb templates). Also, it doesn't look nice, which was
the point of introducing the human-friendly url scheme into hgweb.

But different encoding scheme is an interesting idea. I've tried
another solution for this: doubly-encoding slashes into %252F. That
seems to work (and is a suggested solution for urlencoded slashes in
Apache httpd), so it's an option number two, and if nobody has
objections, I will try and send a V2 with revescape filter that does
this.

Thanks for the feedback!


More information about the Mercurial-devel mailing list