Nested Subrepos non-recursive/deferred cloning

Fri Oct 28 13:30:18 EDT 2016

On Fri, Oct 28, 2016 at 3:03 AM, Pierre-Yves David
<pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 10/23/2016 08:26 PM, Ken Frederickson wrote:
>>
>> Hello,
>>
>> When using subrepos, I frequently get in a situation where nested
>> subrepos result in multiple copies of the same repo. This can cause
>> several headaches, like a hit on sync time, confusion which copy of the
>> redundant repo I'm co-developing, etc. Additionally, it's troubling that
>> cloning of the parent repo fails if the clone of the subrepo fails,
>> which could easily happen if the URL of the subrepo has been altered
>> (i.e. server migration).
>>
>> My solution is to write a custom extension that largely mimics the
>> functionality of subrepos, but does not automatically recursively clone
>> subrepos. Instead, I would make a command that I could execute at each
>> repo level that would pull one or all of its subrepos. My question is:
>> have some of these issues already been considered or partially addressed
>> with more recent subrepo work? Should I contribute to subrepo or should
>> I stick with an independent extension?
>
>
> We recently gained the ability to have both version of binary flag (eg `hg
> up --check` and `hg up --no-check`. (This is very new and not documented
> yet). We could use this with the canonical subrepository option and clone to
> introduce a `hg clone --no-subrepository` to would skip the subrepo clone.
> This could be extended to other operation
>
> What do you think ?

Yes I think preventing the automatic recursive clone would go a long
way. This would give the user the opportunity to modify the the .hgsub
file before the subrepo clone has occurred to point to an alternate
url. Personally, I'd also like the ability to clone individual
subrepos by name (perhaps by using the path defined in the .hgsub
file). Something like 'hg clone -S lib/foo'. And clone them all with
something like 'hg clone -S --all'. (I think 'clone' isn't the right
command. Maybe 'hg update -S lib/foo'). This is handy when your
dependencies differ based on your build configuration and you only
need a subset of your subrepos.

On the practical usage of the feature to avoid redundant copies of
repos in the tree, this presents similar workflow challenges to what I
describe below. For any repo that would appear more than once in the
tree, I would manually avoid cloning it after the first instance and
point dependent repos' builds to the one copy. This loses the benefits
of automatic update of subrepo hashes and push protection if dependent
repos have uncommitted changes. What I want is the ability to have a
single copy of repos and still have them track.

>> I understand the recommended way
>> <https://www.mercurial-scm.org/wiki/Subrepository> of avoiding redundant
>> copies of repos is to use a super repo ("shell repo"). Unfortunately,
>> this comes with a number of undesirable side effects. It doesn't allow
>> my company's strict policy that every check-in pass a smoke test. If a
>> repo does not maintain its own subrepos but instead relies on a shell
>> repo, the repo can't be built and smoked in an atomic check-in
>> operation. It requires a second commit to shell repo. The other side
>> effect is I necessarily need to create a companion shell repo for
>> everything I want my CI server to test. Then there are workflow issues
>> if the repo is a subrepo of an app shell: somehow I need to commit the
>> repo, the app shell, and the dedicated companion shell in an atomic way.
>>
>> I've tried the Guest Repo and Repoman extensions without success. Is a
>> new run at a subrepo alternative of interest to others?
>
>
> Can you elaborate on the issue you encountered with these solutions

Here goes:

The docs recommend that "all repositories containing 'real' code have
no subrepositories of their own (ie they are leaf nodes)" and that I
use a shell repo to track their interdependent revisions. That means
that a leaf repo does not carry its own dependency information.
Therefore, an automated build server would not be able to have a job
dedicated to the leaf repo in isolation: it requires some other repo
to fill in missing dependency info or some hardcoding in the job
script. The latter is unwieldy because it amounts to trying to track
the dependency by hand and losing the benefit of subrepos altogether.
The need for a shell repo means I can't build and smoke a changeset in
an atomic checkin.

If I relax my requirement that every checkin must be smokable and
embrace the shell repo concept, I run into workflow issues. Let's say
I have a libA that contains unit tests and depends on libB. We've
already found out that I can't run a build job to run libA's tests
when I push the libA repo because the build job can't know how to
fetch libA's dependencies. Instead I need to create an additional
libAShell repo just to track libA's dependence on libB. The build job
would run when I push the shell. That means 1) bad changesets of libA
can make it on the server and 2) I have to maintain a trivial shell.

That pattern is understandably inconvenient, but becomes silly with
just one more level of complexity. Now let's say that App depends on
libA. Now it's an AppShell that tracks libA as well as libB. Where
does libAShell live in the mix? It's the odd man out. When I change
libA within AppShell, I need to figure out how to update libAShell as
well so that my build server will validate the changes. So what would
the workflow be? Commit to AppShell/libA, push to local
libAShell/libA, commit libAShell, push libAShell to server, (wait for
smoke result and iterate if necessary), commit AppShell, push AppShell
to server.

Now what if lib B also changed? Commit AppShell/libB, push to local
libB, push to server, (iterate), push to local libAShell/libA, commit
libAShell, push libAshell to server, (iterate), commit AppShell,
pushAppShell to server.

Of course you could do something on the build server side that knows
how to move libAShell forward if you check in libA, but this
legitimate and simple scenario results in a lot of complexity.

The extension I've begun designing will flatten a nested subrepo tree
to a two-level tree with a single parent and N child repos. When I
child repo depends on another repo, the other repo becomes a peer in
the group of child repos. So say libA depends on libB and App depends
on libA and libB. Cloning App will clone libA and libB into a
dependency pool. When libA attempts to clone it's own copy of libB, it
will instead be linked with the existing copy of libB which is peer to
it in the dependency pool. Like subrepos, if I have working changes to
libB, both libA and App will not be able to commit. When I commit
libB's changes, both libA and App's .hgsubstate (or equivalent) will
be updated.

There are several challenges to this approach and I'll have to set
some constraints. Such as, what if App and libA depend on different
revisions of libB? I'm working through some of these questions.

>
> Cheers,
>
> --
> Pierre-Yves David