A case for subrepos with absolute URLs

Mads Kiilerich mads at kiilerich.com
Sun Dec 11 20:15:41 CST 2011


Arne Babenhauserheide wrote, On 12/11/2011 10:29 PM:
> Hi,
>
> We had some discussion on how bad absolute subrepo-URLs are, but I think now that these are merely implementation details seeping through, which make life hard for those who actually use subrepos.
>
> Specifying an absolute path for a subrepo is a very simple way to specify an upstream. Thus, if you use Mercurial with absolute-path-subrepos, it can become an advanced dependency tracking system.
>
> Starting to do that is as simple as can be: Clone your dependency, update it to the correct version and add it as subrepo. Then, when someone else gets your repository, he gets the same setup, including the upstream information (that’s what the subrepo-path is then: upstream information in the place where you expect it).
>
> Problems in this scheme come only from one source: What happens if you don’t have access to the upstream right now? And the only real problem you get is, that you cannot update to revisions anymore, which use versions of the upstream you don’t have yet.
>
> The reason for that is, that Mercurial tries to guarantee a completely consistent state along subrepos, which creates strong coupling. And that is not tied to absolute URLs. It is rather a fundamental problem of strong coupling between seperate Mercuial-Repositories.
>
> If we have relative paths, then we remove a subrepo and delete its relatively specified source repo a few years later, we cannot go back to these old revisions anymore.
>
> So the problem does not originate in absolute URLs, these just show the problem. It originates in the strong coupling.
>
> Because of that I want to argue, that Mercurial should not discourage the use of absolute URLs in subrepos, but rather reduce the consistency requirement over subrepo boundaries. A few ideas:
>
>
> * Add a way to get subrepo revisions from the parent repo on pull in the same way as we can get them when cloning.
>
> * Try harder to find relatively specified subrepos by checking heuristics: often “subrepo” can be found at “../subrepo”.
>
> * Add the possibility of ignoring missing subrepos (this should make it impossible to change the corresponding substate without changing the subrepo to an existing subrepo source).
>
> * Add the possibility of ignoring missing revisions in an existing subrepo-source. Here we’d need some way to specify a new revision for the subrepo.
>
> * Maybe even add a place inside the .hg where we store all subrepos which any revision depended on, so we don’t need to be able to access them when we update to a revision which needs them. Hardlinks should make the cost of this negligible in most cases.
>
>
> Especially the last three parts should reduce the coupling between parent-repo and the source of the subrepo, so subrepos should come closer towards becoming first class citizens in Mercurial.
>
> Best wishes,
> Arne
>
> PS: This even bit me on my own systems when pushing over ssh, because I had a seperate target for the subrepo (to have it in my double-backed-up filesystem-tree). Or had to reinitialize a subrepo, because the old one broke → inaccessible revisions of the parent repo ⇒ Never require perfection in any part of the system.
>

I think you have some valid points, but I also think you connect the 
dots incorrectly and draw an incorrect picture.

As you point out the strong coupling is an issue not related to absolute 
paths. Issues with strong coupling is for example tracked (or at least 
reported) on http://mercurial.selenic.com/bts/issue2520 "Impossible to 
transition from a bad .hgsubstate", but there are also other similar 
open issues. It seems like this is the main topic of the mail. I don't 
have many comments to that before we have more specific proposals or 
patches.

I'm a bit puzzled why absolute URLs also are mentioned in the subject 
and throughout, but I will take the bait and comment a bit on that:

First of all: A consequence of using absolute subrepo urls as it is now 
is that it essentially makes Mercurial a centralized VCS. I agree that 
there are some valid use cases for centralized VCS, and absolute urls 
for subrepos might be a good solution in these cases. But the primary 
use case for Mercurial is as a distributed VCS, so in general it is a 
bad advice to use subrepos with absolute urls.

Ok, you propose to redefine what a subrepo source is (or repeat some 
previous proposals made in a different age). That might mitigate the bad 
advice but it also leaves us with a moving target as topic for the 
discussion - that is hard to reason about.

Yes, external repos used as subrepos will have an upstream with an 
absolute url. I agree that it might be convenient to have that url 
tracked in the repo in some way, but that doesn't mean that the absolute 
url should be used as subrepo source. (I also think it is hard to 
imagine a well organized work flow where it is relevant for more than 1 
or 2 developers to introduce new upstream revisions. Having the upstream 
urls in a README might not be the worst solution.)

I think subpaths (as described on 
http://mercurial.selenic.com/wiki/Subrepository#Use_.27trivial.27_subrepo_paths_where_possible 
) provides a reasonably elegant solution to many problems in this area, 
not only a workaround.

Anyway: This is mainly a matter of making it possible to control what 
path (default or something else) is put in .hg/hgrc of subrepo clones. 
This path is rarely used anyway, so some .hgsub syntax for controlling 
that wouldn't hurt ... but I think it will add complexity for no benefit.

/Mads








More information about the Mercurial-devel mailing list