[PATCH 1 of 3] pull: add --subrepos flag

Wed Feb 20 22:55:22 CST 2013

Angel Ezquerra wrote:
> On Wed, Feb 20, 2013 at 6:57 AM, Matt
> Harbison<matt_harbison at yahoo.com>  wrote:
>> On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:
>>
>>> # HG changeset patch # User Angel
>>> Ezquerra<angel.ezquerra at gmail.com> # Date 1360519226 -3600 # Node
>>> ID abbd26cca35280fb8f784b3f2c02eef71696c47b # Parent
>>> 55b9b294b7544a6a144f627f71f4b770907d5a98 pull: add --subrepos
>>> flag
>>>
>>> The purpose of this new flag is to ensure that you are able to
>>> update to any incoming revision without requiring any network
>>> access. The idea is to make sure that the repository is
>>> self-contained after doing hg pull --subrepos, as long as it
>>> already was self-contained before the pull).
>>>
>>> When the --subrepos flag is enabled, pull will also pull (or
>>> clone) all subrepos that are present on the current revision and
>>> those that are referenced by any of the incoming revisions.
>> I haven't gotten a chance to really play with this yet, so I'm
>> going more off the comments here- I apologize if these answers
>> should be obvious, but I'm not familiar enough with some of the
>> code.
>>
>> - Is there an easy way to tell if the repo is/was self contained?
>> (Maybe incoming -S?)
>
> No there is not. I don't think incoming -S would do the trick since
> that would just tell you if there are _new_ incoming revisions on
> some of the _current_ subrepos. A repo is "self-contained" if it is
> possible to update to any of its revisions withing requiring a pull
> of one or more of its subrepos.
>
> I don't know of any existing mercurial command that would be able to
> give you that information.
>
>> - Is the 'self-contained' bit to limit overhead on each pull, or is
>> there another reason this can't ensure the result is self
>> contained?  'Push' and 'outgoing -S' recognize (almost) everything
>> going in the other direction, so it might be nice to have the same
>> capability with a form of pull.  (I may have found a push bug that
>> I haven't gotten back to yet.)
>
> I'm not sure I understand what you mean.

Consider this (contrived) case:

   1) push a repo and subrepo to remote (remote is now self contained)
   2) strip or rollback the remote subrepo
   3) a top level local repo push will repopulate the remote subrepo

Now reverse it:

   1) pull a repo and subrepo from remote (local is self contained)
   2) strip or rollback the local subrepo
   3) nothing is incoming top level, so the subrepo isn't repopulated (if
you've updated to a working dir without that subrepo).

I'm not sure if there's a less contrived case, or if this matters too
much.  I guess I was just wondering aloud about the symmetry between
push and pull -S (this is certainly much better than it was).

I realize you can't do such a thing without crawling most of the
history.  Are there large public repositories that use subrepos?  I'm
wondering what the performance hit would be.  (It's easy for me to think
something is a good idea when I only have small repos and wouldn't
notice the hit.)

FWIW, largefiles works the same way- if you don't clone or pull with
--all-largefiles, there's no single command to go back and get the files
for all revisions that are not incoming.  That leaves the user wondering
if they really can disconnect from the central repo.

> I don't think you (we?) must give too much importance to this
> "self-contained" concept. It is just a way for me to explain the
> purpose of the patch, and specially to explain why we must look for
> subrepos on all the new incoming revisions, and why we cannot just
> limit ourselves to pulling the subrepos on the current revisions
> (short answer: because new subrepos may appear on the new, incoming
> revisions).
>
> My patch explicitly says that hg pull -S will only make your subrepo
> self-contained if it was already self-contained before. This is in
> order to avoid having to look for subrepos on all the repo history,
> rather than just looking for subrepos on the incoming revision (and
> the current one).
>
>> - The full subrepo gets pulled, even revs not committed to the
>> parent?  I think that's a good thing, because regularly get burned
>> when I 'pull -u' the tree to another machine and then go to apply
>> the rest of a patch queue to the subrepo.
>
> Yes. It is perhaps not optimal but I think it is simpler. In
> addition if different parent repo revisions point to different
> revisions on a subrepo there is no way for us to tell which of those
> subrepo revisions is the one that is closes to tip, or which ones
> are ancestors of the other ones, etc. As a result we would need to
> perform as many pulls on a given repo as the number of different
> revisions of that subrepo that were referenced on the parent repo.
> That is complex and slow, so it is much simpler and possibly faster
> (in some cases at least) to just pull all revisions from each
> subrepo.

OK, I misread the code- I thought each subrepo was getting a pull at
each revision, which I figured would be slow.  I attached a test patch
below- there's nothing special about it, but it helped me with my pull
and outgoing changes (some comments probably still reflect this), so I
changed that to pull and incoming to test your patch.

- I think I see a double pull of a subrepo (search for "hg pull -S -r 3").

- I wonder if the code in this patch can be leveraged to make incoming
print all of the stuff 'pull -S' will grab in the future.

>> I'll try to experiment with this some in the next few days.  I ran
>> into issues with what I'm working on (push, outgoing) with deeply
>> nested subrepos, and also when a parent locks in an earlier subrepo
>> version.  I wonder if deeply nested subrepos will be a problem here
>> since hgsubrepo.pull() doesn't walk its subrepos and pull them.
>
> I must confess that I have not tried that too much. We should
> definitely do this recursively. That being said I hope to get some
> feedback on the current version that I sent to the list first.

Sorry, I got crossed up on that too.  hgsubrepo.pull() ends up calling
_repo.pull(), so it does recurse.  The test below indicates that clone 
won't recurse- it reminds that an update is needed.  (Maybe clone needs 
a -S too as part of these changes?  If you aren't walking the history, I
don't see a way around that because nothing is incoming after a clone,
so you won't see subrepos of a cloned subrepo.)

The other thing worth a test is largefiles- the --all-largefiles
option isn't passed to subrepos, so as it stands, 'pull -S' won't let
you really disconnect, because largefiles in subrepos won't be cached. 
That can be fixed later.

> Cheers,
>
> Angel
>

# HG changeset patch
# Parent 5515fe1a8cbf0097440d327e901ddcaa7f1afa04
# User Matt Harbison <matt_harbison at yahoo.com>
# Date 1361421052 18000

tests: adapt various 'push -S -r' tests to 'pull -S'

diff --git a/tests/test-issue2314.t b/tests/test-issue2314.t
new file mode 100644
--- /dev/null
+++ b/tests/test-issue2314.t
@@ -0,0 +1,174 @@
+  $ cat >> $HGRCPATH <<EOF
+  > [alias]
+  > slog = log --template '{rev}:{node|short} {desc|firstline}\n'
+  > sout = outgoing --template '{rev}:{node|short} {desc|firstline}\n'
+  > sin  = incoming --template '{rev}:{node|short} {desc|firstline}\n'
+  > EOF
+
+It appears that if the subrepo doesn't exist at the time of the clone,
+both out and push will abort.  I thought this worked...  Avoid the
+issue for now by initing an empty repo where the subrepo will go
+  $ hg init dest
+# XXX: pull aborts even though these are empty?!
+#  $ hg init dest/sub
+#  $ hg init dest/sub/subsub
+  $ hg clone -q dest src
+  $ hg init src/sub
+  $ cd src
+
+  $ echo '0' > sub/foo.txt
+  $ hg add -R sub sub/foo.txt
+  $ hg ci  -R sub -m "standalone subrepo commit"
+
+  $ echo '1' > sub/foo.txt
+  $ echo 'sub = sub' > .hgsub
+  $ hg add .hgsub
+  $ hg ci -S -m "lock in subrepo @ 1"
+  committing subrepository sub
+
+Add a grandchild to test all paths
+  $ hg init sub/subsub
+  $ echo 'subsub' > sub/subsub/bar.txt
+  $ hg add -R sub/subsub sub/subsub/bar.txt
+  $ echo 'subsub = subsub' > sub/.hgsub
+  $ hg add -R sub sub/.hgsub
+
+  $ echo '2' > sub/foo.txt
+  $ hg ci -S -m "lock in subrepo @ 2"
+  committing subrepository sub
+  committing subrepository sub\subsub
+
+  $ echo '3' > sub/foo.txt
+  $ hg ci -R sub -m "standalone subrepo commit @ 3"
+
+Add a grandchild to test all paths
+XXX: This subrepo won't get pushed, even with current code!!
+  $ hg init sub/phantom
+  $ echo 'phantom' > sub/phantom/bar.txt
+  $ hg add -R sub/phantom sub/phantom/bar.txt
+# XXX: this should be '>>' instead of '>'
+  $ echo 'phantom = phantom' > sub/.hgsub
+  $ echo '4' > sub/foo.txt
+  $ hg ci -S -m "lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)"
+  committing subrepository sub
+  committing subrepository sub\phantom
+  $ echo '5' > sub/foo.txt
+  $ hg ci -R sub -m "standalone subrepo commit @ 5"
+
+Change the subrepo back to a previous rev to make sure that if tip of
+parent is pushed, all of the required csets in the subrepo are also
+pushed regardless of what the parent tip has locked in.
+  $ hg up -R sub -r 2
+  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ hg ci -S -m "lock in subrepo @ 2 (again)"
+#  $ echo '2.1' > sub/foo.txt
+#  $ hg ci -R sub -m "standalone subrepo commit @ 2.1"
+
+The tree built in src
+  $ hg slog
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  1:000653a98f7b lock in subrepo @ 2
+  0:c4c54d98a77b lock in subrepo @ 1
+The tree in src/sub
+  $ hg slog -R sub
+  5:5b1b79eebbd2 standalone subrepo commit @ 5
+  4:cac37e36a35f lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:4aef09d75093 standalone subrepo commit @ 3
+  2:f9f5c072ebeb lock in subrepo @ 2
+  1:30154f8598be lock in subrepo @ 1
+  0:5879cf081918 standalone subrepo commit
+
+
+Test the various forms of 'incoming'
+
+The current form where entire subrepo is pushed
+  $ hg sin -S --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  0:c4c54d98a77b lock in subrepo @ 1
+  1:000653a98f7b lock in subrepo @ 2
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+
+This shouldn't push all of the subrepo
+  $ hg sin -S -r 1 --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  0:c4c54d98a77b lock in subrepo @ 1
+  1:000653a98f7b lock in subrepo @ 2
+
+This should push all of the subrepo that has been locked in
+(i.e. NOT @ 5)
+  $ hg sin -S -r tip  --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  0:c4c54d98a77b lock in subrepo @ 1
+  1:000653a98f7b lock in subrepo @ 2
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+
+This should want to pull the subsub repo
+  $ hg pull -S -r 1  --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 3 changes to 2 files
+  cloning subrepo sub from $TESTTMP/src/sub
+  pulling subrepo sub from $TESTTMP/src/sub
+  searching for changes
+  no changes found
+  (run 'hg update' to get a working copy)
+
+  $ hg slog -R ../dest
+  1:000653a98f7b lock in subrepo @ 2
+  0:c4c54d98a77b lock in subrepo @ 1
+  $ hg slog -R ../dest/sub
+  5:5b1b79eebbd2 standalone subrepo commit @ 5
+  4:cac37e36a35f lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:4aef09d75093 standalone subrepo commit @ 3
+  2:f9f5c072ebeb lock in subrepo @ 2
+  1:30154f8598be lock in subrepo @ 1
+  0:5879cf081918 standalone subrepo commit
+
+This should want to pull subrepo rev 4, even though rev 2 is locked in
+here
+  $ hg sin -S -r 3  --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  searching for changes
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+
+..and this should pull subrepo rev 4
+  $ hg pull -S -r 3  --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  searching for changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 2 changes to 1 files
+  pulling subrepo sub from $TESTTMP/src/sub
+  searching for changes
+  no changes found
+  pulling subrepo sub from $TESTTMP/src/sub
+  searching for changes
+  no changes found
+  (run 'hg update' to get a working copy)
+
+Pull all subrepo revs locked into the parent
+  $ hg pull -S -r tip  --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  no changes found
+
+  $ hg slog -R ../dest/sub
+  5:5b1b79eebbd2 standalone subrepo commit @ 5
+  4:cac37e36a35f lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:4aef09d75093 standalone subrepo commit @ 3
+  2:f9f5c072ebeb lock in subrepo @ 2
+  1:30154f8598be lock in subrepo @ 1
+  0:5879cf081918 standalone subrepo commit
+
+And finally the legacy push does a full subrepo push.  It seems like
+the exit code might be wrong given that it pushes a subrepo.
+  $ hg pull -S   --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  searching for changes
+  no changes found