[PATCH] subrepo: do not push "clean" subrepos when the parent repo is pushed

Angel Ezquerra angel.ezquerra at gmail.com
Sat Feb 16 04:34:09 CST 2013

On Sat, Feb 16, 2013 at 6:03 AM, Matt Harbison <matt_harbison at yahoo.com> wrote:
> Angel Ezquerra wrote:
>> On Fri, Feb 15, 2013 at 4:49 AM, Matt Harbison<matt_harbison at yahoo.com>
>> wrote:
>>> On Thu, 14 Feb 2013 01:13:44 +0100, Angel Ezquerra wrote:
>>>> On Thu, Feb 14, 2013 at 1:06 AM, Angel Ezquerra
>>>> <angel.ezquerra at gmail.com>  wrote:
>>> ...
>>>> This is step one in the plan that Matt, Martin and I discussed to
>>>> improve subrepos during the London sprint.
>>> Is there a brief overview of the plan somewhere?
>> I wrote a summary on the titanpad that we used during the sprint:
>> http://titanpad.com/mercurial26
>> I've taken that and updated it a little with what has been discussed
>> in this thread and my own thinking since them:
> Thanks for the writeup.  I think I understand most of this, up to the
> caching and deletable parts.  But that seems a bit further into the future,
> and maybe it will become more clear as this evolves.
> I'm not sure if deletable helps this, and I haven't looked into it yet, but
> any plans on being able to update between revs where a file exists but in
> the other rev, a subrepo exists instead in the same directory? (I think this
> is what issue3131 is about.)

Yes, that is one of the issues that this is trying to address.

Let me try to explain the "caching and deletable" part a little better:

The idea is that updating to a revision that does not refer to a subrepo should
remove the subrepo form the working directory. That would resolve issue3131
among other things.

This requires fixing several issues:

1. Mercurial should not delete a subrepo that has untracked changes.
2. Mercurial should not delete a subrepo that has "unsynchronized" changes
3. Mercurial should not really delete a subrepo, but "cache it". Otherwise
running hg update 000 on your central repo would delete all your subrepos
as well!
4. Mercurial should be able to pull from a cached subrepo when the actual
subrepo is not found on the remote working directory.

This is what the "deletable" and "caching" will try to fix.

>> * Discuss ways to improve some of the subrepo pain points - angel, mg
>>    - add way to pull subrepos with hg pull
>>    - just push changed subrepos
>>    - subrepos are eternal
>>    - We had a discussion with mpm on this and we believe we have a good
>> plan:
>>        - Add -S/--subrepos flag to hg pull
>>        - Add cleanstore() method to subrepos which can tell pull/push
>> if a subrepo store has changes. If not we can ignore it during push
>>            - The cleanstore method would check a timestamp or sha1 of
>> the bookmarks, phaseroots and the changelog, but not the
>> dirstatedirstate
>> dirstate
>>          - Updated on clone and push; but also on pull if the store is
>> already clean
>>        - Only push not cleanstore() subrepos
>>        - subrepo.deletable(): if self.clean() and dirstate working dir is
>> clean()
>>        - fix Merge bug which makes subrepos stay on the working
>> directory on update even if there are files on the parent that should
>> go on the folder occupied by the subrepo
>>            - using deletable() we could remove subrepos from the working
>> dir...
>>                - however this could cause data loss on update on a
>> "central" repository containing relative subrepos. On non central
>> subrepos it would require recloning the subrepo when updating back to
>> a revision referring to that subrepo.
>>         - Introduce subrepo caching (this would fix the problem above
>> and improve pull time considerably when subrepos are deleted)
>>              - Cloning would need to be smart enough to look on the
>> subrepo cache. Otherwise it would not be possible to pull from central
>> repositories that were are at revision -1 (since all subrepos would be
>> on the cache, and none on the workind directory).
>>     - Document subpaths patterns that can make relative subrepos work
>> with bitbucket and google code
>> Perhaps it would be worth adding this to the wiki?
>>>> I also am working on step two of the plan, which was to add a
>>>> "--subrepos" flag to hg push.
>>> Is this your idea about passing (some?) parameters to subrepos [1]?  If
>>> so, does 'outgoing' need the same method of filtering the option dict [2]
>>> for consistency?  (I was a bit surprised that outgoing -S passes along
>>> the
>>> --rev option, which causes it to abort in the subrepo with a (parent)
>>> hash, or lie or abort if given a rev.)  There's also a couple bugs
>>> written
>>> about --addremove not being passed along, so what to pass or not seems
>>> like a wider (general?) problem.
>> The way I've implemented it is very similar to how the current
>> subrepo.get() works. Basically I want hg pull --subrepos to behave as
>> if you first did "hg pull" and then you did "hg update -r" for every
>> new revision that "hg pull" brought into the repository. The idea is
>> to make sure that you are able to update to any new revision without
>> needing to have any network access (i.e. that the repository is self
>> contained after doing hg pull --subrepos, as long as it already was
>> self contained before).
> I really like this capability.

Glad you like it. I think it will be very handy for heavy subrepo users.

>> Martin Gesiler helped me with this patch during the sprint and he has
>> been helping me since then. I'll send a patch soon.
>> That being said, out of the options that pull takes there are a lot
>> that do not make sense when pulling from subrepos, particularly --rev,
>> --bookmark, --branch and specially --rebase.
> Aren't --bookmark and --branch simply opts that predate the bookmark() and
> branch() revsets, without any special semantics?  (I'm wondering
> specifically about bookmarks because I don't use them much, and IDK if
> --bookmark + --update does anything special that --rev + --update doesn't.)
> One of the things that seemed like it would be useful when trying to fix
> push -r is to have the methods in commands.py translate these options to a
> list of revs there, and pass that along.  That way, every repo (even without
> subrepos) gets a list of revs instead of these various opts. The subrepo
> layer would have to translate to child revs before passing them on, but that
> doesn't seem terribly difficult.

I think that the subrepo update should be linked to the revisions that are
pointed to on the parent repository revision). If you want to update your
subrepos independently of the parent repo the best solution IMHO is to
use the onsub extension. Then you can pass a revset as you suggest, which
will be interpreted by every individual command that is executed on each

> I don't use --rebase, but I can see how that would be bad.
>> I think that if you need
>> to do something special while pulling a subrepo perhaps it would be
>> best to get into the subrepo and do a regular pull, or perhaps use the
>> onsub extension (which I wish we integrated into mercurial). One thing
>> that would help there would be to have a way to refer to the parent
>> paths when pulling from within a subrepo. This would be particularly
>> handy when using relative subrepo definitions, as you should.
> I'm not sure what you mean about refering to parent paths.  Do you have a
> use case in mind?

This is a bit unrelated to the rest of the thread. I guess you need to know
how the onsub extension works to understand what I meant. The onsub
extension simply executes a command on every subrepo.

For example:

    hg onsub "hg pull"

This will run hg pull on every subrepo, using the default path of every subrepo.

When a subrepo is first created, its hgrc file is created automatically, and
a default path (and possibly a default-push path) is added to it. This default
path is the one the subrepo was cloned from, as specified on the parent
repository .hgsub file. The example command above would use that default
path when running hg pull on a given subrepo.

The problem is that if the subrepo is defined with a relative sync URL (as
it should be), and then you change the default path on the parent repo,
the pull source when you pull a subrepo from its parent repo, and the pull
source when you pull from within the subrepo itself (as the onsub extension
does) are no longer the same.

That is why I would like the onsub extension to get a way to "access" the
parent repository sync paths. The extension already provides a
$HG_SUBURL environment variable, but it does not provide a way to get
the parent repo default (or other) path.



More information about the Mercurial-devel mailing list