[PATCH 11 of 11 RFC] url: refactor util.drop_scheme() and hg.localpath() into url.localpath()

Mads Kiilerich mads at kiilerich.com
Sun Mar 27 17:34:32 CDT 2011


Thanks for finishing these patches!

It should perhaps be noted that RFC 2396 is quite clear that the slash 
after a hostname syntactically is a part of the path, and that is also 
how Pythons urlparse.urlsplit does it. That is however not how URLs 
usually is used and it is also not how Mercurial sees it (which often 
confuses users using ssh urls with absolute paths). This parser 
considers the first slash after the hostname a separator that isn't a 
part of the path. Excellent!

(It is perhaps not so relevant to test that we can parse ssh urls with 
passwords. That do however show that it is a generic parser without any 
hardcoded scheme assumptions - which is fine for our purpose.)

Some comments to the last patch:

Brodie Rao wrote, On 03/26/2011 07:29 AM:
> # HG changeset patch
> # User Brodie Rao<brodie at bitheap.org>
> # Date 1301119907 25200
> # Node ID 0d6a6ff7d3c7703672f40ed994b721ecd51e0222
> # Parent  79716c21ef9a7816d71fc8da1b6a3dd051d507bb
> url: refactor util.drop_scheme() and hg.localpath() into url.localpath()
>
> This replaces util.drop_scheme() with url.localpath(), using url.url for
> parsing instead of doing it on its own. The function is moved from
> util to url to avoid an import cycle.
>
> hg.localpath() is removed in favor of using url.localpath(). This
> provides more consistent behavior between "hg clone" and other
> commands.
>
> This change has a side effect of making URLs like bundle://../foo not
> correspond to relative paths with "hg bundle". This is because ".." is
> considered to be the host for the URL. The URL should be bundle:../foo
> to be relative.

This doesn't seem right to me.

That would be a behavioral change where an existing and commonly used 
way of specifying a bundle no longer works. Some tests in the test suite 
even had to be changed because they used the syntax that no longer works.

I think we will need a hack that makes bundle 'urls' stay backward 
compatible.

I tend to see 'bundle:' as just a prefix, just like 're:' and 'glob:'. 
Schemes do usually specify the access/transport mechanism to access a 
resource, but bundles actually uses the file mechanism (at least by 
default) - the difference is only in what kind of resource is found and 
how it is used. It could in principle also make sense to use 
'bundle:http://server/bundle.hg'.

Would it be possible to split this patch up and do the refactorings 
without changing bundle?

> Invalid URLs like the one above never worked with push, in, out, etc.,
> so this makes URL handling more consistent.

Would the url help topic need a review after these patches?

> Comparison of old and new behaviors:
>
> URL                      drop_scheme()   hg.localpath()    url.localpath()
> ===                      =============   ==============    ===============
> file://foo/foo           /foo            foo/foo           /foo
> file://localhost:80/foo  /foo            localhost:80/foo  /foo
> file://localhost:/foo    /foo            localhost:/foo    /foo
> file://localhost/foo     /foo            /foo              /foo
> file:///foo              /foo            /foo              /foo
> file://foo               (empty string)  foo               /
> file:/foo                /foo            /foo              /foo
> file:foo                 foo             foo               foo
> /foo                     /foo            /foo              /foo
> file:///C:/foo           /C:/foo         /C:/foo           /C:/foo
> file://C:/foo            /foo            C:/foo            /foo
> file://D:/foo            /foo            D:/foo            /foo

Do we really want to silently accept and ignore any 'host' specification 
and thus also ignore 'drive letters'? Wouldn't it be better to abort if 
the host is something it shouldn't be, and thus only accept '' and 
'localhost' - and driveletters on windows?

> On Windows:
>
> URL                      drop_scheme()   hg.localpath()    url.localpath()
> ===                      =============   ==============    ===============
> file://localhost:80/foo  C:/foo          localhost:80/foo  /foo
> file://localhost:/foo    C:/foo          localhost:/foo    /foo
> file://localhost/foo     C:/foo          C:/foo            /foo
> file:///foo              C:/foo          C:/foo            /foo
> file://foo               (empty string)  foo               /
> file:/foo                /foo            /foo              /foo
> file:foo                 foo             foo               foo
> /foo                     /foo            /foo              /foo
> file:///C:/foo           C:/C:/foo       /C:/foo           C:/foo
> file:///D:/foo           C:/D:/foo       /D:/foo           D:/foo
> file://C:/foo            C:/foo          C:/foo            C:/foo
> file://D:/foo            C:/foo          D:/foo            D:/foo

It could be (relatively) interesting to see 'file:C:/foo' in the table 
as well.

> diff --git a/mercurial/url.py b/mercurial/url.py
> --- a/mercurial/url.py
> +++ b/mercurial/url.py
> @@ -63,6 +63,8 @@ class url(object):
>           self.scheme = self.user = self.passwd = self.host = None
>           self.port = self.path = self.query = self.fragment = None
>           self._localpath = True
> +        self._hostport = ''
> +        self._origpath = path
>
>           if not path.startswith('/') and ':' in path:
>               parts = path.split(':', 1)
> @@ -115,6 +117,7 @@ class url(object):
>               # Don't split on colons in IPv6 addresses without ports
>               if (self.host and ':' in self.host and
>                   not (self.host.startswith('[') and self.host.endswith(']'))):
> +                self._hostport = self.host
>                   self.host, self.port = self.host.rsplit(':', 1)
>                   if not self.host:
>                       self.host = None
> @@ -201,9 +204,31 @@ class url(object):
>           return (s, (None, (str(self), self.host),
>                       self.user, self.passwd or ''))
>
> +    def localpath(self):
> +        if self.scheme == 'file' or self.scheme == 'bundle':

Wouldn't it be cleaner to avoid hardcoding of the bundle format and have 
something like

     def localpath(self, filescheme='file'):


> +            path = self.path or '/'
> +            # On Windows, we need to promote hosts containing drive
> +            # letters to paths with drive letters.
> +            if (os.name == 'nt' and len(self._hostport) == 2 and
> +                self._hostport[0].isalpha() and self._hostport[1] == ':'):
> +                path = self._hostport + '/' + self.path
> +            elif self.host is not None and self.path:
> +                path = '/' + path
> +            # We also need to handle the case of file:///C:/, which
> +            # should return C:/, not /C:/.
> +            elif (os.name == 'nt' and path.startswith('/') and
> +                  len(path)>  2 and path[1].isalpha() and path[2] == ':'):
> +                # Strip leading slash from paths with drive names
> +                return path[1:]
> +            return path
> +        return self._origpath
> +
>   def has_scheme(path):
>       return bool(url(path).scheme)

btw: would it make sense here to normalize 'file' urls to be just their 
path and no scheme?

>
> +def localpath(path):
> +    return url(path, parse_query=False, parse_fragment=False).localpath()
> +
>   def hidepassword(u):
>       '''hide user credential in a url string'''
>       u = url(u)

/Mads


More information about the Mercurial-devel mailing list