[PATCH] subrepo: look for local pull source first

Martin Geisler mg at aragost.com
Fri Mar 25 03:36:44 CDT 2011


Mads Kiilerich <mads at kiilerich.com> writes:

Hi Mads,

You ask an awful lot of questions :) -- let me go through them.

> On 03/24/2011 07:16 PM, Martin Geisler wrote:
>> # HG changeset patch
>> # User Martin Geisler<mg at aragost.com>
>> # Date 1300989966 -3600
>> # Node ID 4757320187407df151d680613c991f653ec0c3ec
>> # Parent  78a0a815fd41d794550d12428362cd51b261c1c6
>> subrepo: look for local pull source first
>
> You really mean local? Why restrict it to local? Why can't the same
> method be used for all paths?

The goal is to speedup 'hg clone a b' where repo a has big
subrepositories that refer to external (SSH or HTTP) repositories. The
idea is that if we can find a repository on the filesystem that already
has the changeset we need, then we will just use that.

What I would really like would be to have 'hg pull' hardlink revlogs
when it can -- in general... That way 'hg clone' would be reduced to 'hg
init' + 'hg pull' in the code and all pulls into empty repositories
would get hardlinks.

>> With this change, a subrepository will always try to find the
>> changesets it needs in a repository relative to the pull source of
>> the top-most parent repository. So if we have
>>
>>    repo/
>>      sub/
>>
>> and make a clone with 'hg clone repo clone', then clone/sub will pull
>> in changesets from repo/sub, regardless of what the .hgsub file says.
>
> It should be noted explicitly that this is a change of behavior and
> not really a bug fix.

Yes, it is not a bugfix. It just makes some things faster, without
changing any of the semantics. When I talk to people about this, they
all find it strange that Mercurial did not look for the subrepository
in-place, after all, "it's right there!" :)

> It will kick in every time an unknown hg subrepo revision is requested
> and it thus tries to make a pull _and_ the top repo happens to have a
> local default path.

Yes, that is correct.

> What is the rationale for doing it for hg subrepos only?

No rationale, except that I started with what I know best. It could be
done for Git subrepos too, but not for SVN since you really have to talk
to the one central server there.

> FWIW I don't like command line tools that do trial-and-error. It
> should be 100% predictable what a command line tool will do. DWIM is
> nice when it does the right thing, but it also makes it harder to
> understand and learn the tool and find usage errors.

Well, it is predictable what the *result* is: you get the revision you
are looking for.

>> This will allow you to make clones of repositories where the .hgsub
>> file uses 'sub = ../sub' paths. The problem with these repositories
>> is that they are structurally different from where they were cloned
>> from. By looking for the subrepository in-place, we avoid this
>> problem.
>>
>> It will also allow you to make clones while offline, even if a
>> repository uses subrepositories that are specified with remote URLs.
>
> This will effectively change non-trivial relative paths to trivial
> paths, right? I'm all for deprecating and discouraging use of
> non-trivial and absolute paths, but why should we change the semantics
> of them? Why can't people just start using trivial paths?

You cannot use trivial 'sub = sub' paths on a service like Bitbucket.

> Wasn't it the plan that it should be possible to use subpaths to use
> other paths than the ones in .hgsub?

Yes, that was the plan, and my client does use the [subpaths] section,
mostly for remapping '^http:// = ssh://' for external contractors who
don't have access to the internal HTTP URLs.

But it is not easy to remap 70 paths back to the trivial pats when you
just want to do a local clone.

> If we want this I think it should be guarded by a command line option.
> But because it will kick in when nobody expects it it would perhaps be
> more convenient with an ugly config setting.

It does not change anything from the user's perspective: he gets the
revision he asked for.

I'll grant you this: he might get other additional changesets than if he
had pulled from the source listed in .hgsub. Currently, Mercurial will
pull in all changesets from the repo listed in .hgsub, with my patch it
will pull in all changesets from the in-place subrepo.

We could change it to pull the precise revision we want from the
in-place subrepo and then do a full pull from the .hgsub subrepo. I
don't like this since it destroys the cool use case where you make a
clone and the canonical subrepo (from .hgsub) is offline.

> I also wonder if it wouldn't be better to try to do what the subrepo
> config says before we fall back to try other sources.

The whole idea is that .hgsub contains the canonical path of the
subrepository. But if we have local access to a repository with the
right changeset, then it is stupid not to use that one.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://aragost.com/en/services/mercurial/blog/


More information about the Mercurial-devel mailing list