Managing Dependencies with Subrepositories - Partial Checkout and Commit/Push Propagation

Michael P. Jung mpjung at terreon.de
Fri Jan 15 09:08:23 CST 2010


I'm sorry for this rather long message. In this message I'm going to
explain the way I use subrepositories to manage dependencies in my web
projects. If you're tired of reading so much text just skip to the last
two paragraphs where I sum up the problems I have in a brief manner.



I recently made the switch from SVN to HG and one of the big stumbling
blocks are dependencies. I don't want to manage a script for fetching
dependencies, but rather let the revision control system handle this.

My concrete use case is the development of many different websites that
share a lot of dependencies. Depending on the project I have up to 10
external dependencies - sometimes even more.

In order to give you a more concrete use case let's imagine I was
developing the website 'example.com'. A typical project layout looks like:

~/hg/example.com/documents/          : Documentation, Mockups, etc.
~/hg/example.com/media/css/          : CSS Files
~/hg/example.com/media/jquery/       : JQuery
~/hg/example.com/media/img/          : Image Files
~/hg/example.com/media/js/           : JS Files/
~/hg/example.com/python/             : PYTHONPATH of the project
~/hg/example.com/python/chimes/      : Chimes
~/hg/example.com/python/django/      : Django
~/hg/example.com/python/south/       : South
~/hg/example.com/python/sorl/        : SORL Thumbnails
~/hg/example.com/python/...          : ...more dependencies
~/hg/example.com/python/example_com/ : Python files of the project
~/hg/example.com/templates/          : Template directory
~/hg/example.com/wsgi/django.wsgi    : WSGI file for deployment
~/hg/example.com/...                 : ...

I modified the "manage.py" so that it automatically detects this
directory structure and adds the "python" directory to the PYTHONPATH.
As I keep all dependencies in the python directory there is no
virtualenv involved and deploying applications including their
dependencies is a snap.

Since I include the dependencies directly in the PYTHONPATH I need the
subrepositories to skip any extra directory hierarchy. e.g. the complete
Django repository [1] would causes troubles as I only need the sub
directory "django" in my PYTHONPATH. Since I want to be as independent
of external resources for development and deployment I created copies of
those repositories that are directly includeable in the python path.
chimes [2] is an example how such dependency only repository looks like.

Since I'm managing those repositories myself it's not a big burden to
organize them in the way I need them. Still it's far from perfect.
Another option would be to have an extra directory like

~/hg/example.com/external/django/

and just link the required bits in place using symbolic links

~/hg/example.com/python/django -> ../../python/django/django

This would allow me to check out the complete dependency, but probably
will break for windows users. Right now this is not a huge issue, as
we're all Linux or Mac users. Nonetheless it kept me from doing it this
way so far.

So for me it would be ideally if there was a way to do partial checkouts
of repositories. It's not a big deal if the entire history and even
entire tree is pulled from the remote repository. Just the local sandbox
should be somewhat chrooted to the part I require. So /django/.hg/ could
contain everything from the django repository, but the sandbox should
only contains files from the django subdirectory without an intermediate
directory.

Surely all this could be solved with some even more magic manage.py and
django.wsgi files for deployment, but I like to keep things as simple as
possible and prefer things that "just work" (tm).



Now for something different and far more critical. It's also related to
subrepositories and the way they're integrated into mercurial. Since I
mainly use subrepos for managing dependencies it's extremely quirky that
every time I push the main repository all subrepositories are pushed as
well. If I make some change to a subrepository and want to delay the
commit and/or the push a bit, there is no way to do so. Even worse,
commit messages from the main project are sometimes carried off to the
subrepositories causing totally mixed up change logs. Once I even
managed to create an empty changeset for the subrepository which just
contained the commit message from the main repository. This is really ugly.

The ideal solution for me would be if commit and push would never
propagate to subrepositories. It would be far less dangerous if there
was simple a warning telling the user that some subrepository had been
changed and that it needs to be commited and pushed separately.

Also I'm sometimes on the road and only have a rather slow GSM via my
mobile phone available. Pushing changes via this connection is rather
slow, but it gets even slower when every single subrepo is pushed as
well. The network latency really kills the usability of Hg in this case.
It would make far more sense to only push subrepos if the .hgsubstrate
has been changed. Otherwise pushing subrepos is just a waste of time.




Summary:

- A way of doing partial checkouts in mercurial would make it a lot
easier to use subrepositories to manage dependencies. It would be fine
if neither the history nor the directory structure was trimmed. I'd be
happy with a way to just 'chroot' the sandbox.

- Commit and push propagation must not affect subrepositories unless
explicitly wanted by the user. Usually commit messages from the main
repository don't make little to no sense for the subrepositories. Unless
the .hgsubstrate file has changed pushing subrepositories is just a
waste of time and bandwidth.


[1] http://code.djangoproject.com/svn/django/trunk/
[2] https://hg.labs.terreon.de/common/chimes/


--mp


More information about the Mercurial mailing list