subrepos with ssh urls with absolute paths

Mads Kiilerich mads at kiilerich.com
Sun Sep 12 19:15:31 CDT 2010


  [mk at D610 hg]$ cd /tmp
[mk at D610 tmp]$ hg init repo
[mk at D610 tmp]$ hg init repo/s
[mk at D610 tmp]$ echo a > repo/s/a
[mk at D610 tmp]$ hg -R repo/s ci -Am0
adding a
[mk at D610 tmp]$ echo s = s > repo/.hgsub
[mk at D610 tmp]$ hg -R repo ci -Am1
adding .hgsub
committing subrepository s
[mk at D610 tmp]$ hg clone ssh://localhost//tmp/repo repo2
requesting all changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 2 changes to 2 files
updating to branch default
pulling subrepo s from ssh://localhost/tmp/repo/s
remote: abort: There is no Mercurial repository here (.hg not found)!
abort: no suitable response from remote hg!
[mk at D610 tmp]$ grep default repo2/.hg/hgrc
default = ssh://localhost//tmp/repo
[mk at D610 tmp]$ grep default repo2/s/.hg/hgrc
default = ssh://localhost/tmp/repo/s

For some reason the double slash isn't used for the subrepo. That is the 
issue reported in 
http://selenic.com/pipermail/mercurial/2010-September/034726.html .

This is caused by mercurial/subrepo.py _abssource which now uses 
urlparse and normpath.

A:
http://www.selenic.com/hg/rev/a2bc2f2d77a9 introduced normpath on the 
path part in order to handle ..-relative paths in http URLs. Normpath 
might be fine on http URLs in all cases, but it isn't fine on ssh URL 
paths where leading double slash has special semantics. That could 
perhaps be solved by normalizing path[1:] if it wasn't for

B:
As issue discussed in http://bugs.python.org/issue7904 Python urlparse 
tries to be smart and handle known protocols differently, and thus 
doesn't find a netloc (host) part in an ssh URL:
 >>> urlparse.urlparse('http://foo//bar')
ParseResult(scheme='http', netloc='foo', path='//bar', params='', 
query='', fragment='')
 >>> urlparse.urlparse('ssh://foo//bar')
ParseResult(scheme='ssh', netloc='', path='//foo//bar', params='', 
query='', fragment='')

(but
 >>> urlparse.urlunparse(urlparse.urlparse('ssh://foo//bar'))
'ssh://foo//bar'
)

We should thus (probably) only use urlparse when we know it is a 
http/https URL - never with ssh URLs.


How should we resolve this issue?

One possible answer might be: There are quite well-defined use-cases for 
defining subrepos both with simple relative (s = s) and remote absolute 
(s = scheme://x/y) .hgsub mappings. All other mappings are so strange or 
problematic that we can't support them without getting into huge 
problems, and we should avoid the complexity of trying. The use-cases 
that were (badly) solved with these "other" mappings are probably better 
solved with the new subpaths remapping. So perhaps we should back 
a2bc2f2d77a9 out?

Or should we implement our own urlparse, for general use or only in this 
place?

Perhaps just something like

--- a/mercurial/subrepo.py
+++ b/mercurial/subrepo.py
@@ -156,11 +156,9 @@
          if '://' in parent:
              if parent[-1] == '/':
                  parent = parent[:-1]
-            r = urlparse.urlparse(parent + '/' + source)
-            r = urlparse.urlunparse((r[0], r[1],
-                                     posixpath.normpath(r[2]),
-                                     r[3], r[4], r[5]))
-            return r
+            url = parent + '/' + source
+            i = url.find('/', url.find(':') + 3) + 1
+            return url[:i] + posixpath.normpath(url[i:])
          return posixpath.normpath(os.path.join(parent, repo._subsource))
      if push and repo.ui.config('paths', 'default-push'):
          return repo.ui.config('paths', 'default-push', repo.root)
?

/Mads


More information about the Mercurial-devel mailing list