A thought on subrepos

Sun Apr 17 18:20:14 CDT 2011

On 14 April 2011, Matt Mackall said:
> It seems many projects with subrepos are structured like:
> 
> app/ <- main repo
>  lib/ <- a subrepo
> 
> This is perhaps the most obvious way to do things, but is not really
> ideal. A better way is:
> 
> build/ <- main repo
>  app/ <- subrepo
>  lib/ <- subrepo

I've been having similar thoughts, but I don't know how to solve the
central problem: if I break our overgrown, entangled,
single-giant-source-tree up into multiple repos, how do developers get
a working build environment without checking out *everything*.  (If
you have to checkout *everything*, then why bother splitting up the
tree?)

Let me make it a little more clear.  Our source tree looks something
like this:

  backend/
    lib1/
    lib2/
    stuff/
      lib3/
      app1/
      lib4/
    app2/
    morestuff/
      lib5/
      app3/
  libs/
    lib6/
    lib7/
    ...
    lib15/
  db/
    schema1/
    schema2/
    lib16/
    lib17/
  web/
    lib18/
    app4/
    app5/
  frontend/
    app6/
    app7/

...and so on, and so on.  (We build ~300 individual apps, ranging from
ridiculously tiny to impressively large.  There are probably ~100
distinct libraries, but I've never really counted.)

Let's say I want to split this up: there's no reason for backend
developers to have to see web or frontend code, and there's no reason
for frontend developers to have to see db or backend code.  But both
web and backend developers need db, and everyone needs libs.

So the obvious thing to do is replace our one-big-source-tree with:

  Makefile
  .hgsubs
    frontend = frontend
    libs = libs
    db = db
    web = web
    frontend = frontend
  .hgsubstate

Then, the small minority of developers who like to see everything can
do so, and the build system can see everything.  But that does nothing
for the majority of developers, who only want to see the code that
they work on plus its dependencies.  Hmmm.  That won't work: you have
to manually clone libs, db, and backend.  Or libs and frontend.  Not
very helpful.

Next idea: use subrepos for dependencies.

  backend:
    Makefile
    .hgsub:
      libs = libs
      db = db

  web:
    Makefile
    .hgsub:
      libs = libs
      db = db

  frontend:
    Makefile
    .hgsub:
      libs = libs

But that sucks: it means you will have 3 copies of libs/ and 2 copies
of db/ if you want to checkout everything!  Yuck.

Hmmm.  Now I think I see the point of Matt's idea.  In our case, it
might look something like this:

  everything:
    Makefile
    .hgsubs:
      backend = backend
      libs = libs
      db = db
      web = web
      frontend = frontend

  backend-env:
    Makefile
    .hgsubs:
      libs = libs
      db = db

  web-env:
    Makefile
    .hgsubs:
      libs = libs
      db = db

  frontend-ev:
    Makefile
    .hgsubs:
      libs = libs

On further reflection, using subrepos for "everything" might be
misguided.  The main purpose of that would be "get me the latest
version of everything" -- maybe for a nightly build, or maybe because
we want to make a library change to a widely-used library.  In that
case, the strict version tracking provided by subrepos might be a bad
fit.  Maybe "checkout everything" should be a small shell script
instead of a super-repo.

(And of course, maybe splitting our large source tree up isn't a great
idea.  Maybe if we had a less insane build system, we would not mind
having all of our code in one big tree so much.  ;-)

        Greg