Nested Subrepos non-recursive/deferred cloning

Tue Nov 1 12:07:11 EDT 2016

On Mon, Oct 31, 2016 at 10:30 PM, Pierre-Yves David
<pierre-yves.david at ens-lyon.org> wrote:
> This email have many interesting bits, especially:
>
> * a bit about avoiding cloning,
>
> * a bit about avoiding multiple copy of the same repository,
>
> * a bit about tracking dependency.
>
> That last one is a complex and dense topic. My reply to it is taking a bit
> too long. So I'm releasing the answer about the first two bits only to avoid
> delaying it too long.
>
>
> On 10/28/2016 07:30 PM, Ken Frederickson wrote:
>>
>> On Fri, Oct 28, 2016 at 3:03 AM, Pierre-Yves David
>> <pierre-yves.david at ens-lyon.org> wrote:
>>>
>>>
>>> On 10/23/2016 08:26 PM, Ken Frederickson wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> When using subrepos, I frequently get in a situation where nested
>>>> subrepos result in multiple copies of the same repo. This can cause
>>>> several headaches, like a hit on sync time, confusion which copy of the
>>>> redundant repo I'm co-developing, etc. Additionally, it's troubling that
>>>> cloning of the parent repo fails if the clone of the subrepo fails,
>>>> which could easily happen if the URL of the subrepo has been altered
>>>> (i.e. server migration).
>>>>
>>>> My solution is to write a custom extension that largely mimics the
>>>> functionality of subrepos, but does not automatically recursively clone
>>>> subrepos. Instead, I would make a command that I could execute at each
>>>> repo level that would pull one or all of its subrepos. My question is:
>>>> have some of these issues already been considered or partially addressed
>>>> with more recent subrepo work? Should I contribute to subrepo or should
>>>> I stick with an independent extension?
>>>
>>>
>>>
>>> We recently gained the ability to have both version of binary flag (eg
>>> `hg
>>> up --check` and `hg up --no-check`. (This is very new and not documented
>>> yet). We could use this with the canonical subrepository option and clone
>>> to
>>> introduce a `hg clone --no-subrepository` to would skip the subrepo
>>> clone.
>>> This could be extended to other operation
>>>
>>> What do you think ?
>>
>>
>> Yes I think preventing the automatic recursive clone would go a long
>> way. This would give the user the opportunity to modify the the .hgsub
>> file before the subrepo clone has occurred to point to an alternate
>> url. Personally, I'd also like the ability to clone individual
>> subrepos by name (perhaps by using the path defined in the .hgsub
>> file). Something like 'hg clone -S lib/foo'. And clone them all with
>> something like 'hg clone -S --all'. (I think 'clone' isn't the right
>> command. Maybe 'hg update -S lib/foo'). This is handy when your
>> dependencies differ based on your build configuration and you only
>> need a subset of your subrepos.
>
>
> hg update is taking revision as argument. So it might not be the best place.
> We could use `hg revert` for this and it would not even need a -S argument.

Yeah, I can see why 'hg update' isn't right. 'hg revert' is
interesting, although from its name and description I expect it to
restore to a changeset I already had, not perform the subrepo fetch.
Maybe 'hg clone' still is the closest to what I'm describing but
instead of providing the source path, you provide the local path given
in the .hgsub file. When using this -S argument, the -r arg would be
implied by the hash in .hgsubstate. It's also compatible with other
clone args: if I give it a -r or -u, it clones the subrepo and the
parent repo is immediately dirty (as shown in 'hg status -S').

>
> Also, I'm do not think `hg clone` itself is triggering the recursive clone,
> if I remember correctly, the update after the close is triggering clone of
> the repository that needs update. So cloning without update (hg clone
> --noupdate), should get you half the way here.
>

Good to know the distinction. After I do the --noupdate clone, I just
need a way of updating the immediate repo and not its subrepos.

Once again, these features don't give me the benefit of dependency
tracking and push protection, which is really the most powerful and
convenient part of subrepos. I would have to manually overwrite hashes
in .hgsubstate files if a child depends on another child.

>> On the practical usage of the feature to avoid redundant copies of
>> repos in the tree, this presents similar workflow challenges to what I
>> describe below. For any repo that would appear more than once in the
>> tree, I would manually avoid cloning it after the first instance and
>> point dependent repos' builds to the one copy. This loses the benefits
>> of automatic update of subrepo hashes and push protection if dependent
>> repos have uncommitted changes. What I want is the ability to have a
>> single copy of repos and still have them track.
>
>
> Did you gave a shot to the "share" extension. A couple of version ago, it
> gained the ability to use a "clone pool". Any clone of a repository already
> in the pool will actually be performed as the creation of a new share (a new
> working copy, sharing the history with other) and an update of the history
> in the pooled repository.
>
> It seems like it could fit your need perfectly.
>

I was not aware of this extension! This looks interesting and I played
with it for a while. I can say that it does not help in the dependency
tracking or push protection areas. The simplest form of what I need is
a parent and one child track the state of another child in their
respective .hgsubstate files and include push protection if the
dependency repo is dirty. Since the share does not include working
directory, no push protection. And since the sharing of histories does
not affect which revision is checked out across the pool, the
automatic update of subrepo hashes in .hgsubstate won't happen.

>> […]
>>
>> The extension I've begun designing will flatten a nested subrepo tree
>> to a two-level tree with a single parent and N child repos. When I
>> child repo depends on another repo, the other repo becomes a peer in
>> the group of child repos. So say libA depends on libB and App depends
>> on libA and libB. Cloning App will clone libA and libB into a
>> dependency pool. When libA attempts to clone it's own copy of libB, it
>> will instead be linked with the existing copy of libB which is peer to
>> it in the dependency pool. Like subrepos, if I have working changes to
>> libB, both libA and App will not be able to commit. When I commit
>> libB's changes, both libA and App's .hgsubstate (or equivalent) will
>> be updated.
>
>
> You should have a look at the "share pool" mechanism I pointed above it
> seems like it could a large part of your usecase here.
>

The share extension would simplify these workflows somewhat because I
would not need to push/pull between the local copies of the repos.
Other than that, it remains an awkward interaction between the two
local repos only so I can maintain the shells to track dependency
info. What I was trying to illustrate above is that the shell repo
paradigm is not well suited to certain workflows. In particular, I'd
argue it's a barrier to automated builds for libs.

Thanks for your advice!
-Ken

>>
>> […]
>>
>
> Cheers,
>
> --
> Pierre-Yves David