Merge on push?

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Fri Jun 13 01:53:04 CDT 2008


On Fri, Jun 13, 2008 at 2:28 AM, Jesse Glick <jesse.glick at sun.com> wrote:
> Roman Kennke wrote:
>> I'd like to propose an extension to the push command. I imagine an
>> option --merge [...] which makes push try a merge on the remote side
>> if necessary [...and] then try to perform a non-interactive merge and
>> commit on the remote repository, when the push would create an
>> additional head. This should fail when the changes touch files that
>> have been touched by other changesets since their 'branch-point'.
>
> Something like this would be really, really valuable. I can concur that
> the need to explicitly pull and merge before pushing, even when your
> changes are unrelated to anything else that has been done remotely, is
> the single biggest reason for annoyance with Mercurial (compared to CVS)
> in the team I work on.
>
> I think your proposal is sensible. Its main drawback that I can see is
> that it would require support on the server as well as the client (i.e.
> a wire protocol change).
>
> I have also been thinking about an extension command which would
> essentially do
>
> loop:
>   hg push
>   if success:
>     exit 0
>   else:
>     hg fetch
>     if files merged (even w/o conflict) or anything else unusual:
>       exit 1
>
> This would I think be relatively easy to implement. Call it 'hg synch'
> perhaps: when successful it means your local and remote repos should be
> identical. You could imagine some refinements to let you push and fetch
> only the active named branch, etc. Unlike using CVS/SVN, you still have
> a full record of what the developer originally committed vs. what was
> merged.
>
> 'hg synch' would force you to pull and update remote changes even if you
> would rather not do so just yet, unlike push --merge, although you could
> always use 'hg up <older>' to go back if you wanted to. In my experience
> it is unusual to not want to get the latest upstream stuff anyway; if
> nothing else, your chance of future merge conflicts goes down the newer
> your local repo is. (Under CVS I would occasionally hear a developer
> working on a small corner of the product pipe up out of the blue and ask
> about a broken source tree. Further questioning would reveal that the
> developer had not updated other parts of the source tree in months!)
>
> 'hg push --merge' could be significantly faster than 'hg synch' because
> doing a merge forces inodes for the whole working copy and parts of the
> repository to be loaded into memory caches. A server processing --merge
> requests would be doing automated merges constantly and keeping
> everything in cache, whereas a developer running synch would likely do
> so infrequently and interspersed with other operations (builds, email,
> ...) which would flush the Hg working copy out of the cache. This is an
> especially important consideration for a big repository that most people
> only work on small portions of at a time, since a developer can 'hg di'
> and 'hg ci' just one subdir fairly quickly but repository-wide
> operations may be many times slower.
>
> 'hg push --merge' would make it possible to push some changes while
> there will still uncommitted modifications elsewhere in the tree, which
> could be valuable, whereas it is hard to see how synch could relax
> fetch's requirement that the working copy be a clean checkout of tip.

I actually started implementing something like this a while back. I
had at first thought it might be a fairly easy thing to add directly
to protocol.unbundle(). However, I no longer think it should be done
in the repo being served directly. The problem is one of locking and
consistency towards clients: the merge will leave the repo with more
than one head per branch for a while. And a failed merge will even
have to roll back. So you need a staging repo. Which means
administrative decisions etc.

So I think this should be handled by a dedicated pair of extensions,
one client side, one server side, because I don't believe mpm is going
to accept such a thing into the core (understandably).

Client sketch:
  send bundle
  receive bundle
  apply bundle

Server sketch:
  receive bundle
  lock repo:
    try to unbundle in repo, aborting on multiple heads
    if aborted:
      prepare staging repo
      unbundle in staging repo
      switch to correct branch
      merge, abort on non-trivial merges
      commit
      run hooks (to, for example, run *very* quick smoke tests)
      pull staging repo into main repo
      send bundle with merge rev back
    else:
      send empty bundle back
  unlock repo

Keeping the main repo locked for the duration of the merge would be to
avoid races with other server-side merges. And it might be simpler to
just pass the resulting merge rev back to the client and let it pull
again.

As soon as smoke tests fail to run very quickly, I think one should
switch to a model where the client just uploads bundles to a queue and
waits for an asynch confirmation of some sort (email?) whether
integration was successful. She then just pulls manually. This would
avoid the need for per-developer branches.

I already have a prototype for server-side extensions in my rclone
work (see http://freehg.org/u/parren/rexec/), but that's only for
http(s). If anyone is going to pursue this direction, then I suggest
we team up to add a proper and stable interface for server-side
extensions to Hg that works across all the server types (unless that
already exists and I missed it).

If anyone wants to look, I can send the patches for my prototyping on
push-merge. But it's not much really.

>> there is a chance that this results in a broken tree even when there
>> are no overlapping changes, but experience with more traditional
>> RCSes [...] shows that such cases are very rare in a reasonable
>> structured project.
>
> I would second this. At least on the project I work on - with something
> on the order of 1 MLOC over 80k files and >100 developers - "hard" merge
> conflicts (<<< === >>>) are unusual, nonconflicting merges are in many
> cases fine straight from diff3's output without additional fixups, and
> build breakages caused by merging nonoverlapping changesets are
> definitely rare.
>
> Locally validating every merge is out of the question: just a clean
> build and basic smoke test can take upwards of an hour, not to mention
> more advanced tests, some of which require a special environment to be
> set up. Even an unusually patient developer doing such validation of a
> merge would almost certainly find that a new merge was needed
> afterwards, since you can expect a new changeset to be arriving in the
> central repository every few minutes on average during peak hours. It is
> much more practical on such a project to let a continuous builder find
> the occasional problem for you.
>
> The best setup is (arguably) a branch per developer which gets
> independently tested and incrementally merged with other stuff by an
> automated server process. This puts a lot of burden of complexity and
> performance on a server, however, and may be overkill for many projects.

-parren


More information about the Mercurial mailing list