Nested Subrepos non-recursive/deferred cloning

Tue Nov 1 04:30:46 UTC 2016

This email have many interesting bits, especially:

* a bit about avoiding cloning,

* a bit about avoiding multiple copy of the same repository,

* a bit about tracking dependency.

That last one is a complex and dense topic. My reply to it is taking a 
bit too long. So I'm releasing the answer about the first two bits only 
to avoid delaying it too long.

On 10/28/2016 07:30 PM, Ken Frederickson wrote:
> On Fri, Oct 28, 2016 at 3:03 AM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org> wrote:
>>
>> On 10/23/2016 08:26 PM, Ken Frederickson wrote:
>>>
>>> Hello,
>>>
>>> When using subrepos, I frequently get in a situation where nested
>>> subrepos result in multiple copies of the same repo. This can cause
>>> several headaches, like a hit on sync time, confusion which copy of the
>>> redundant repo I'm co-developing, etc. Additionally, it's troubling that
>>> cloning of the parent repo fails if the clone of the subrepo fails,
>>> which could easily happen if the URL of the subrepo has been altered
>>> (i.e. server migration).
>>>
>>> My solution is to write a custom extension that largely mimics the
>>> functionality of subrepos, but does not automatically recursively clone
>>> subrepos. Instead, I would make a command that I could execute at each
>>> repo level that would pull one or all of its subrepos. My question is:
>>> have some of these issues already been considered or partially addressed
>>> with more recent subrepo work? Should I contribute to subrepo or should
>>> I stick with an independent extension?
>>
>>
>> We recently gained the ability to have both version of binary flag (eg `hg
>> up --check` and `hg up --no-check`. (This is very new and not documented
>> yet). We could use this with the canonical subrepository option and clone to
>> introduce a `hg clone --no-subrepository` to would skip the subrepo clone.
>> This could be extended to other operation
>>
>> What do you think ?
>
> Yes I think preventing the automatic recursive clone would go a long
> way. This would give the user the opportunity to modify the the .hgsub
> file before the subrepo clone has occurred to point to an alternate
> url. Personally, I'd also like the ability to clone individual
> subrepos by name (perhaps by using the path defined in the .hgsub
> file). Something like 'hg clone -S lib/foo'. And clone them all with
> something like 'hg clone -S --all'. (I think 'clone' isn't the right
> command. Maybe 'hg update -S lib/foo'). This is handy when your
> dependencies differ based on your build configuration and you only
> need a subset of your subrepos.

hg update is taking revision as argument. So it might not be the best 
place. We could use `hg revert` for this and it would not even need a -S 
argument.

Also, I'm do not think `hg clone` itself is triggering the recursive 
clone, if I remember correctly, the update after the close is triggering 
clone of the repository that needs update. So cloning without update (hg 
clone --noupdate), should get you half the way here.

> On the practical usage of the feature to avoid redundant copies of
> repos in the tree, this presents similar workflow challenges to what I
> describe below. For any repo that would appear more than once in the
> tree, I would manually avoid cloning it after the first instance and
> point dependent repos' builds to the one copy. This loses the benefits
> of automatic update of subrepo hashes and push protection if dependent
> repos have uncommitted changes. What I want is the ability to have a
> single copy of repos and still have them track.

Did you gave a shot to the "share" extension. A couple of version ago, 
it gained the ability to use a "clone pool". Any clone of a repository 
already in the pool will actually be performed as the creation of a new 
share (a new working copy, sharing the history with other) and an update 
of the history in the pooled repository.

It seems like it could fit your need perfectly.

> [â€¦]
>
> The extension I've begun designing will flatten a nested subrepo tree
> to a two-level tree with a single parent and N child repos. When I
> child repo depends on another repo, the other repo becomes a peer in
> the group of child repos. So say libA depends on libB and App depends
> on libA and libB. Cloning App will clone libA and libB into a
> dependency pool. When libA attempts to clone it's own copy of libB, it
> will instead be linked with the existing copy of libB which is peer to
> it in the dependency pool. Like subrepos, if I have working changes to
> libB, both libA and App will not be able to commit. When I commit
> libB's changes, both libA and App's .hgsubstate (or equivalent) will
> be updated.

You should have a look at the "share pool" mechanism I pointed above it 
seems like it could a large part of your usecase here.

>
> [â€¦]
>

Cheers,

-- 
Pierre-Yves David