Repository Design Question

Thu Jul 19 09:57:37 CDT 2007

On 7/18/07, Nathaniel Filardo <nwfilardo at gmail.com> wrote:
...a bunch of stuff marked [A] through [E] with the wrong behavior

Things will work out for you if you use a slightly different
procedure. I'll re-write your example to demonstrate.

> ## [A] Initialize repository and import the full test suite
$ hg init
$ touch p2p3 p2only p3only
$ hg commit -A -m Base

(which is the same as in your example except for the common file)

> ## [B] Specialize the P2 branch
$ hg branch P2
$ hg rm p3only
$ hg commit -m "P2 only"
$ ls
p2only p2p3

(nothing to see here, please move along)

> ## [C] Go back to common and specialize the P3 branch
$ hg update -C default
$ ls
p2only p2p3 p3only
$ hg branch P3
$ hg rm p2only
$ hg commit -m "P3 only"
$ ls
p2p3 p3only

This piece is only slightly different; I refer to the branch "default"
instead of spelling out a specific revision which happens to be on
that branch. Much easier, especially once we start adding additional
commits to the default branch.

> ## [D] Let's see what happens if we merge

Let's just...not do this part. Or E. Instead, lets say we want to do a bugfix.

## [D*] Perform a bugfix.
$ hg update -C default
$ echo foo > p2only
$ hg commit -m "P2 bugfix"

Okay, now lets move it into P2 and P3. First, P2:

## [E*] Move bugfix into P2.
$ hg update -C P2
$ hg merge default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ cat p2only
foo
$ hg status -A
M p2only
C p2p3
$ hg commit -m "merged with default"

Just as we expected; p2only picked up the change, p2p3 is "clean", and
p3only is still deleted. Okay, so lets move it into P3. We expect the
merge with P3 to be a no-op, since it doesn't have the file p2only.

## [F*] Move bugfix into P3.
$ hg update -C P3
$ hg merge default
remote changed p2only which local deleted
(k)eep or (d)elete? d
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg status -A
C p2p3
C p3only
$ hg commit -m "merged with default"

There's the expected no-op, and p2only did not reappear. The question
is annoying, admittedly, but it's better than what you were doing. I'm
not sure if you can turn it off.

So, in summary, the branches P2 and P3 should contain *only* those
changes necessary to turn a common tree into a specialized tree. Don't
do development in either of them; all changes should be made to the
core, unspecialized branch--default--and then merged into P2 and P3.
When you do that, merge behaves the way you want it to.

> Fundamentally, the design-decision reason for this behavior seems to be that
> branches are ephemeral and intended to be merge-once.

No, that's not it, and they're definitely not supposed to be
merge-once--they can't be in a distributed system or it wouldn't work
at all. The whole point is to automate repeated merges.

The issue is that branching and merging aren't quite what you think they are.

Branches are like sets (in the mathematical sense) of changes, and a
merge takes the union of two such sets. When you take the union of the
sets P2 and P3, you get all the changes from both. Since P2 deleted
p3only and P3 deleted p2only, the union contains deletions of both
p2only and p3only.

So, don't do that. In your case, you want *three* branches, not just two:

1) a branch which is not specific to anything
2) a branch that is the union of (1) and changes that make it P2-specific
3) a branch that is the union of (1) and changes that make it P3-specific

As long as you merge 1 and 2 or 1 and 3 but never 2 and 3, you'll
never have to revert anything while merging.

(To complete the analogy, you can think of revert in the context of a
merge being like replacing one of the arguments of the merge/union
with a subset of that argument.)

Make sense?

Now, one final point. I think you'll find things much easier to keep
straight if, instead of using the many-branches-in-one-repo paradigm,
you use a one-branch-per-repo paradigm. The two approaches are
conceptually identical, but you don't have to be as painstakingly
careful to be on the correct branch as you do with the multi-branch
approach. Plus, one-branch-per-repo is better tested; sane
multi-branch support is brand new.

Basically:

$ mkdir main-branch
$ cd main-branch; hg init; touch p2p3 p2only p3only; hg commit -A -m Base; cd ..
$ hg clone main-branch P2-branch
$ cd P2-branch; hg rm p3only; hg commit -m "P2 only"; cd ..
$ hg clone main-branch P3-branch
$ cd P3-branch; hg rm p2only; hg commit -m "P3 only"; cd ..
$ cd main-branch; echo foo >p2only; hg commit -m "bugfix"; cd ..
$ cd P2-branch; hg pull; hg merge; hg commit -m "merged with main"; cd ..
$ cd P3-branch; hg pull; hg merge; hg commit -m "merged with main"; cd ..

- Evan