[PATCH 19 of 19] util: don't encode ':' in url paths

Mads Kiilerich mads at kiilerich.com
Mon Nov 7 07:56:51 CST 2011


On 11/07/2011 09:04 AM, Maxim Dounin wrote:
> Hello!
>
> On Mon, Nov 07, 2011 at 03:41:09AM +0100, Mads Kiilerich wrote:
>
>> # HG changeset patch
>> # User Mads Kiilerich<mads at kiilerich.com>
>> # Date 1320632710 -3600
>> # Node ID 4fe69cfd994abbc0a1cf00e26bc3e48037923bcb
>> # Parent  f88984c9f46c77e21500cc7e4c50bb100789a83f
>> util: don't encode ':' in url paths
>>
>> ':' has no special meaning in paths, so there is no need for encoding it.
>
> This isn't really true:
>
>     ... In addition, a URI reference
>     (Section 4.1) may be a relative-path reference, in which case the
>     first path segment cannot contain a colon (":") character.
>
> (from RFC 3986, http://tools.ietf.org/html/rfc3986#section-3.3)
>
> The colon is critical to distinguish the first path segment of a
> relative reference from an absolute URI starting with scheme
> ("mailto:something" is an URI in the "mailto" scheme, while
> "mailto%3Asomething" is a relative-path reference).

Mercurials url class is not intended to be a strict implementation of 
RFC 3986. It is more important that it remains backward compatible and 
can handle plain filenames (as described in hg help urls), also on windows.

The url class can thus handle all these cases like this:

     >>> url(r'c:\foo\bar')
     <url path: 'c:\\foo\\bar'>

     >>> url('c:foo/bar')
     <url path: 'c:foo/bar'>

     >>> url('c://foo/bar')
     <url path: 'c://foo/bar'>

     >>> url('c:/foo/bar')
     <url path: 'c:/foo/bar'>

     >>> url('file:c:/foo/bar')
     <url scheme: 'file', path: 'c:/foo/bar'>

     >>> url('file:///c:/foo/bar')
     <url scheme: 'file', path: 'c:/foo/bar'>

but we have

     >>> str(url('file:c:/foo/bar'))
     'file:c%3A/foo/bar'

     >>> str(url('file:///c:/foo/bar'))
     'file:c%3A/foo/bar'

With this change we would get

     >>> str(url('file:c:/foo/bar'))
     'file:c:/foo/bar'

Do you see any real-world examples where this change would be bad for 
Mercurials use of urls?

(For the record: we have a known issue with handling of encoded / in 
paths. I admit that this change could be seen as taking an extra step in 
the wrong direction.)

It would be nice to have a better overview in which way Mercurial urls 
are different from RFC urls.

/Mads


More information about the Mercurial-devel mailing list