Handling thirdparty 'vendor' branches

Dov Feldstern dfeldstern at fastimap.com
Fri Jul 11 03:49:35 CDT 2008


Giorgos Keramidas wrote:
> I'm trying to understand how we can create 'vendor branches' in
> Mercurial.
> 

[...]

> That's amazing.  It's _exactly_ what I wanted Hg to do.  To track the
> rename of the vendor code only in the 'target' tree, and then DTRT when
> merges need to touch/affect files in the target tree.  Fantastic :)
> 

This really is very cool! :)

> 
> Now what?
> =========
> 
> Now my main concerns about this are:
> 
>   * Does this look like a reasonable way to handle thirdparty
>     code imports?
> 
>   * Am I missing something that may potentially mess things up,
>     if I keep pulling and merging from a 'clean vendor branch' of
>     imports like this?
> 
>   * This seems to work nicely with just one thirdparty component,
>     but do you see any potential pitfalls when pulling and
>     merging dozens of thirdparty components?
> 
> 

I don't see any outright problem with this. However, I'm wondering if having all 
of these components tied together in one repository is desirable. Have you 
considered using hgforest instead? I'm working with it (using only it's very 
basic functionality --- not really using snapfiles) and am quite happy with it.

I'll explain where I'm coming from --- I think it is similar to the scenario 
you're describing:

At work, I'm a developer on a component, which is one of dozens comprising a 
project. As a component developer, I only normally change the code of our 
component. However, it is useful (and sometimes necessary) to have access to the 
code of other components, as well --- and certainly all components are necessary 
for building the entire project.

Additionally, we often need to "mix and match" various versions of the 
components. For example, on one branch, say for feature X, consisting of 
components A, B and C, we require versions A.1, B.1 and C.2; and on another 
branch, for feature Y, we use A.2, B.2 and C.2; etc. So it's much more 
convenient to be able to choose a specific version of each component 
independently of the versions chosen for the other components. This becomes much 
more difficult with a monolithic repository like the one you described above.

So, how can this be solved with a forest?

Each component would have its own repository. In order to get a complete tree, 
you clone each component's repository to its correct place within a tree 
structure. This becomes a forest, and using the hgforest extension eases some 
aspects of manipulating the tree all at once, rather than having to manipulate 
each component separately.

Further, if you also want to track "the configuration" --- i.e., which version 
of each component is being used for a given version or branch of the entire 
project --- that can (or should be able to) be done with snapshot files: the 
top-level of the tree in which all clones of the components are placed would 
itself be an hg repository, and would contain (besides "hosting" the components, 
which are *not* part of its repository) a single snapshot file which is tracked, 
and which specifies which version of each component is being used. By tracking 
its history, the project's configuration is versioned.

Some glue is necessary in order to tie all these things together in order to 
make it convenient to use, but I think that should be rather easy to do.

Two notes:

(1) http://www.selenic.com/mercurial/wiki/index.cgi/NestedRepositories could 
become an alternative to hgforest

(2) I've never actually implemented the latter part (tracking the 
configuration). At work, ClearCase is used as our SCM. However, mercurial is so 
much nicer (IMO), that I transfer baselines from CC into mercurial, do my 
development there, and when I'm ready, transfer it back to CC. Even given the 
overhead of the two-way-conversion (which I do semi-automatically), I find it 
much nicer to work in this way. However, given this overhead, I haven't fully 
implemented the strategy outlined above. Rather, I do fully track the component 
that I work on in one repository, but then I track "everything else" in another 
single repository. Given that changes to the configuration ("everything else"), 
within the context of my component, are less frequent, I find this a manageable 
compromise.

(now the rest is off-topic: Sure, I hope to convert others at work to mercurial, 
and maybe someday actually officially switch over. However, this will take time: 
we have thousands of developers all over the world... Specifically, my current 
showstopper from getting *anyone* else to use mercurial, is this: (a) Virtually 
everyone at work develops on Windows. Now, ClearCase provides support for 
symbolic links, even on Windows (by providing a virtual filesystem, maybe Samba 
based?). So (b) our tree depends on symbolic links. Because of (b), after 
converting our trees to a local filesystem (as in mercurial's working directory) 
development can only work on a symbolic-links capable filesystem; and because of 
(a), no one develops on such a system (I myself am working with CoLinux, so for 
me it's possible; but converting everyone to Linux would be even harder, I 
think, than converting them to mercurial). I'm working (slowly, in my spare 
time) on an extension which I hope will provide symbolic link simulation, and 
then maybe I'll be able to get some people to start playing around with this. 
Probably hopeless, but worth a shot... ;) )

Dov


More information about the Mercurial mailing list