Strategies for push/merge problem?

Tue Jul 29 16:56:08 CDT 2008

>-----Original Message-----
>From: mercurial-bounces at selenic.com 
>[mailto:mercurial-bounces at selenic.com] On Behalf Of Giorgos Keramidas
>Sent: Tuesday, July 29, 2008 5:15 PM
>To: Christopher Weimann
>Cc: Mercurial Users
>Subject: Re: Strategies for push/merge problem?
>
>On Tue, 29 Jul 2008 16:42:09 -0400, Christopher Weimann 
><christopher at weimann.us> wrote:
>>Douglas Philips wrote:
>>> Let me see if I have understand this correctly:
>>> Developer A changes files f1.c, f2.c, f3.c
>>> Developer B changes files q1.c, q2.c, q3.c
>>>
>>> because both change independent sets of files, subversion will let
>>> them both push, creating a merged repo of A's f1.c, f2.c, f3.c and
>>> B's q1.c, q2.c, q3.c and neither will know or be told that what has
>>> been created in the central repo never actually existed on any
>>> developer's machine?
>>>
>>
>> You are correct.  That matches understanding of CVS and 
>Subversion.  I
>> just tried it to make sure my understanding is correct.  I 
>Developer A
>> can change a header file or a library source file and commit that.
>> Developer B who hasn't updated yet, changes some application source
>> that uses that header or library and commits and nobody is the wiser
>> that the repository now holds a code set that may be completely
>> broken.
>
>Precisely.  This is why some of the things that are 'common' in CVS are
>not even necessary with Hg :)
>
>> Mercurial would prevent from pushing because there would be 
>new heads.
>> This points out to you that something has changed and requires some
>> action on the developers part.
>>
>> This entire thread boils down to this difference.
>>
>> Subversion will blindly accept changes to different files as though
>> those files have no relation to each other.
>>
>> Mercurial won't.
>>
>> I think this is a Mercurial feature.
>
>I think it's a feature too.
>
>It does not *enforce* anything that can prevent _all_ problems of
>half-done changes, i.e. someone may still do:
>
>    vi include/header.h
>    hg ci -m 'Update header.h for foo' include
>    hg push
>
>    vi lib/libcore/blah.c
>    hg ci -m 'Catch up with header.h'
>    [ forget to push here ]
>
>    # At this point the tree may be broken for everybody else.
>
>Hence it is still possible to do stupid things with Mercurial.  It's
>just a lot easier to _avoid_ being caught by surprise, when one gets
>used to treating changesets in a way that makes them "self-sufficient".
>

I must be missing something here in the work flow.  From what Matt said
earlier, Linus pulls 70 changesets before breakfast.  From what you're
saying here, he would pull a changeset, run all integration tests, pull
the next one, etc.  Perhaps he has incredibly fast integration tests, or
is a very late eater, but I'm thinking more realistically he pulls
several or all of these change sets, merges the ones that don't
conflict, and then runs the integration test.  If he does the latter,
then what is the difference between that and an automated merge of
non-conflicting changes?  Certainly there will exist several change sets
that have never existed on any other developer's box, and that may or
may not pass all tests.  

In your example above, what's to prevent the developer from emailing
Linus just the header.h change?  He might notice a single file change,
but what if it was 19 of 20 files with changes?  Would he notice the
difference before merging?  Would he know which changeset caused the
problem so he could back it out?  For a complex source base, the cause
could be far from the effect, so knowing which of the 70 changesets
caused the problem may not be a trivial exercise.  

As for broken builds, I think Mercurial is pretty well set up so that
you can have a development and gold repository (i.e. crew and stable).
One is guaranteed to build, one is undergoing integration tests and will
almost always build.  I think it's a nice balance.  For us, developers
commit to one repository and the automated process ensures that it
builds, runs any integration tests, and then pushes the new changesets
to the gold repository.

Furthermore, I don't think it's realistic to have just one person doing
the pull.  There is no one person that understands or owns the entire
source base.  It was written over 20 years by hundreds of people.  At
best they would just be rubber-stamping most of the changes.  And we
have hundreds of developers and testers around the world.  Waiting for
our corporate Linus to come in on whatever time zone he or she is in
would mean that we could only get one build a day with new changes.
This would be extremely wasteful compared to our current rolling builds
that appear every 2.5 - 3 hours or so (yes, it takes that long to
compile).  So we'd need multiple Linuses (Linii?), at least four plus
backups to account for national holidays, vacations, etc.  Now we have 8
or 9 people that are pulling, and can't possibly know more than a small
chunk of the source base.  So again they are just blindly pulling.  If
we need to distract that many people to monitor a queue of incoming
changes, we might as well write a queuing system to pull the changes and
reject them in case of conflict.  And in that case, we may as well have
Hg support a push model where if your changes do not conflict it does
the merge automatically.  So I agree with the corporate crowd that this
is the right way to go.  

chuck