Thoughts and suggestions around working with shared libraries

Sat Jan 15 13:43:40 CST 2011

Hi,

I have been thinking of moving to mercurial from perforce for a while 
now. However what's mostly been stopping me, is the support for shared 
and external libraries. Now, when the subrepository support finally 
seems to be stable and good enough, things are changing.

But I still have a few issues. First let me explain my proposed project 
setup. I have two types of libraries. First shared libraries that are 
just my own code. They in there own repositories and are constructed in 
a traditional way, with a main branch, and possible feature branches.
*        version1
|
*        version2
| \
|   *    feature1
|   |
|   *    feature1 v2
| /
*        merged with feature 1

Then we have external libraries, that are mainly maintained by someone 
else, and could be controlled by any version control system. However 
theese quite often needs local patches, that should stay local, and can 
be merged when  the official version changes. Occassionally we need to 
make official patches that are pushed to the official server. Theese 
libraries also get their own repositories.

*        initial version
| \
|   *    official version 1
| / |
*   |    merged with official version
|   |
*   |    local patch
|   |
|   *    official version2
| / | \
*   |  | merged with official version 2
|   |  |
|   |  * official patch by us
|   | /
|   *    official patch merged and updated to the official version
| /
*        merged with the patched version

And finally we have different projects that consists of subrepositories, 
so the structure is something like this. Where each library is a 
subrepository, using relative paths. The projects can then of course do 
their local patches to the libraries, and different projects can use 
different versions of the libraries.

libs/local_library_1
libs/local_library_2
external_libs/external_library
src/

Everything might seem to be ok on the surface, however there's a few 
problems, so far. Let's start with the external library setup. I would 
like to have the official branch point to the external repository, and 
the default branch just behave as a normal mercurial repository. This is 
not something that perforce can do either, but it's something that I 
really would want to see support for.

The workaround is of course, to use the native version control, grab the 
required version, and then just copy the files manually to the official 
branch. The suprepository extension supports external version control 
system, but this doesn't help here, since I want the subrepository to be 
at the root. It also doesn't support branching like I want.

Instead my proposal is to be able to create a branch as a remote 
repository of any type, mercurial, git, svn and so on. This branch will 
always have the initial repository as the base. Inside this branch, you 
would be using the native commands of that version control system.

You can commit snapshots at any time that you don't have any locally 
changed files, with an additional parameter to hg commit, to not confuse 
the system if the external branch is a mercurial one. Snapshots are 
saved normally, but they only contain the history of the snapshots, not 
the full external history. This is clearly the simplest way to do it, 
and it also saves a lot of space. You can always switch to this branch 
and use the native version control to see the full history.

Along with the snapsots, a special file is included, with the path to 
the external repository and it's version, just like the subrepositories 
extension currently does. Internal version control files and 
directories(like .svn) are not included in the snapshot.

You can use hg update to switch to any snapshot, and in that case, it 
always calls the native version control, to switch to that version.

When you switch to another branch, which again shouldn't be allowed with 
local changes, the internal version control files are copied to a hidden 
place, for example inside the .hg directory. Along with them is stored 
the changeset of the active snapshot. When you switch back, the first 
thing it does, is updating to the last active snapshot and then copies 
back the internal version control files. And finally does a native 
version control update to the correct version, followed by a hg update 
to that version.

The special case, where the local state doesn't exist, remember the 
special version control files are stored only locally, is also easy to 
support, just start with an empty workspace, and call the native version 
control system to update.

Note that this system, now allow us to do mercurial branching for those 
snapshots, as long as we only store one official last external state, 
but  theese branches also always are in sync with a changeset of the 
external version control system.

But remember, we wanted to do local changes, that are not in sync with 
the external version control system, like the default branch in the 
example above. This can now be done using normal mercurial merge and 
branching support.

The above might seem confusing, partially because my native language is 
not English, but mostly because I skipped a lot of details, but I have a 
very clear idea, of how almost every special case could work, so just 
ask me if you don't understand.

We now support the external library setup above. And now when we have 
this support, the subrepositories should be simplified, to support only 
relative paths and nothing else. I'm always in favor of simplicity, and 
two systems that can be used for the same things are never good, both 
codewise and userwise.

Continuing in terms of simplicity I also propose that the .hg 
directories of subrepositories, should be stored inside the main .hg 
one. This would get rid of the pull/update problem, pull would always 
pull all subrepositories, making offline work. I don't know the internal 
structure of mercurial well enough, to tell exactly how they should be 
stored, but I'm sure you could come up with a way.

But there's one huge problem left with the setup above. When you have 
done project specific changes to the subrepository, and want to push 
back some changes only to the main repository. In this case perforce is 
really superior, allowing you to integrate single files or directories, 
or even take parts of the file(allthough in that case, you need to force 
another integrate, if you need further changes later). This problem is 
not releated to only this use case, it's a common problem when merging 
branches. For example you have a release branch, but fixed a bug, in the 
default branch, and now you want only this single bugfix to the release 
branch.

I have seen suggestions about cherry picking and hg transplant, 
export/import, or to use the mq extension, but to be honest they are a 
mess, there should really be an easier way for such a common problem. 
One option would be to have some kind of forced merge, the two branches 
are merged, but you as a user are given a chance to select exactly which 
parts you should merge. Theese merges are not part of further normal 
merges, but they still stores extra meta information, so that you can 
see where the merge came from. To filter the merge before you make the 
choices, you could also specify file wildcard patterns.

Transplant already almost does this, but for almost all use cases, it's 
way too complicated, remember simplicity. It's also not documented very 
well. I have hard time figuring out exactly what it does. My suggestion 
might also not be the best, so I'm open to other better suggestions, or 
corrections, if transplant for example is perfect for this case.

This message is getting way too long, so I think I stop here for now. 
Ok, just some general comments about Mercurial.

Overall Mercurial is quite good, but it definitely suffers from not 
being an integrated package, with all the different extensions, some 
that does almost the same thing. It also suffers from trying to do any 
possible workflow that you can imagine.

What you should do instead, is to figure a minimum set of workflows, 
that should suit all projects. Document those workflows very well, make 
sure that those workflows can be handled by mercurial natively, without 
any extensions. Make the most common things that users do, as easy to do 
as you can. I mean for example pulling should automatically update, 
unless you want otherwise. This could need a new set of commands, and 
different parameters, but that would be well worth it IMO.