Merge on push?

Thu Jun 12 19:28:49 CDT 2008

Roman Kennke wrote:
> I'd like to propose an extension to the push command. I imagine an 
> option --merge [...] which makes push try a merge on the remote side
> if necessary [...and] then try to perform a non-interactive merge and
> commit on the remote repository, when the push would create an
> additional head. This should fail when the changes touch files that
> have been touched by other changesets since their 'branch-point'.

Something like this would be really, really valuable. I can concur that 
the need to explicitly pull and merge before pushing, even when your 
changes are unrelated to anything else that has been done remotely, is 
the single biggest reason for annoyance with Mercurial (compared to CVS) 
in the team I work on.

I think your proposal is sensible. Its main drawback that I can see is 
that it would require support on the server as well as the client (i.e. 
a wire protocol change).

I have also been thinking about an extension command which would 
essentially do

loop:
   hg push
   if success:
     exit 0
   else:
     hg fetch
     if files merged (even w/o conflict) or anything else unusual:
       exit 1

This would I think be relatively easy to implement. Call it 'hg synch' 
perhaps: when successful it means your local and remote repos should be 
identical. You could imagine some refinements to let you push and fetch 
only the active named branch, etc. Unlike using CVS/SVN, you still have 
a full record of what the developer originally committed vs. what was 
merged.

'hg synch' would force you to pull and update remote changes even if you 
would rather not do so just yet, unlike push --merge, although you could 
always use 'hg up <older>' to go back if you wanted to. In my experience 
it is unusual to not want to get the latest upstream stuff anyway; if 
nothing else, your chance of future merge conflicts goes down the newer 
your local repo is. (Under CVS I would occasionally hear a developer 
working on a small corner of the product pipe up out of the blue and ask 
about a broken source tree. Further questioning would reveal that the 
developer had not updated other parts of the source tree in months!)

'hg push --merge' could be significantly faster than 'hg synch' because 
doing a merge forces inodes for the whole working copy and parts of the 
repository to be loaded into memory caches. A server processing --merge 
requests would be doing automated merges constantly and keeping 
everything in cache, whereas a developer running synch would likely do 
so infrequently and interspersed with other operations (builds, email, 
...) which would flush the Hg working copy out of the cache. This is an 
especially important consideration for a big repository that most people 
only work on small portions of at a time, since a developer can 'hg di' 
and 'hg ci' just one subdir fairly quickly but repository-wide 
operations may be many times slower.

'hg push --merge' would make it possible to push some changes while 
there will still uncommitted modifications elsewhere in the tree, which 
could be valuable, whereas it is hard to see how synch could relax 
fetch's requirement that the working copy be a clean checkout of tip.

> there is a chance that this results in a broken tree even when there
> are no overlapping changes, but experience with more traditional
> RCSes [...] shows that such cases are very rare in a reasonable
> structured project.

I would second this. At least on the project I work on - with something 
on the order of 1 MLOC over 80k files and >100 developers - "hard" merge 
conflicts (<<< === >>>) are unusual, nonconflicting merges are in many 
cases fine straight from diff3's output without additional fixups, and 
build breakages caused by merging nonoverlapping changesets are 
definitely rare.

Locally validating every merge is out of the question: just a clean 
build and basic smoke test can take upwards of an hour, not to mention 
more advanced tests, some of which require a special environment to be 
set up. Even an unusually patient developer doing such validation of a 
merge would almost certainly find that a new merge was needed 
afterwards, since you can expect a new changeset to be arriving in the 
central repository every few minutes on average during peak hours. It is 
much more practical on such a project to let a continuous builder find 
the occasional problem for you.

The best setup is (arguably) a branch per developer which gets 
independently tested and incrementally merged with other stuff by an 
automated server process. This puts a lot of burden of complexity and 
performance on a server, however, and may be overkill for many projects.