Adding merge --ancestor option?

Angel Ezquerra Moreu angel.ezquerra at gmail.com
Thu Mar 22 11:33:04 CDT 2012


On Thu, Mar 22, 2012 at 4:21 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Thu, 2012-03-22 at 15:44 +0100, Angel Ezquerra Moreu wrote:
>> On Thu, Mar 22, 2012 at 2:37 PM, Matt Mackall <mpm at selenic.com> wrote:
>> > On Wed, 2012-03-21 at 20:48 -0400, Greg Ward wrote:
>> >> On 21 March 2012, Matt Mackall said:
>> >> > > 1) Alice and Bob are working concurrently from the same changeset on
>> >> > >    branch 1.0
>> >> > > 2) Alice commits on 1.0
>> >> > > 3) Alice merges to 1.1
>> >> > > 4) Alice merges to default
>> >> > > 5) Bob commits on 1.0
>> >> > > 6) Bob merges to 1.1, gets a conflict, resolves it
>> >> > > 7) Bob merges to default
>> >> > > 8) Alice pushes and goes home: she's done her day's work
>> >> > > 9) Bob attempts to push and fails: "push creates remote heads"
>> >> > > 10) Bob pulls
>> >> > > 11) Bob merges with Alice on 1.0, 1.1, and trunk
>> >> > > 12) Bob pushes and goes home: he's done his day's work
>> >> > > 13) Carl starts work at the tip of branch 1.0 (Bob's merge with Alice)
>> >> > > 13) Carl merges 1.0 to 1.1: FAIL: he gets Bob's conflict!
>> >> >
>> >> > This is yet another case where we can't do any meaningful
>> >> > differentiation between possible ancestors (the commits in (2) and (5)
>> >> > in this case). We could perhaps walk the graph and notice that (5) has a
>> >> > descendant merge with a conflict, and thus score it higher, but it'll
>> >> > still be trivial to create scenarios with ties.
>> >>
>> >> I was confused at first by how you can detect conflict after-the-fact.
>> >
>> > Simple. A merge without conflicts will have no files listed in the
>> > changeset. In this scheme, we'd try to pick the merge path that had the
>> > most conflicts already resolved. So we'd notice that one of the choices
>> > of ancestor implied merge 'legs' including Bob's conflict resolution
>> > from (6) and choose it over the one with no resolutions in its legs.
>> >
>> > This tweak is much more work than its worth, though, as it nibbles only
>> > a small chunk off the ambiguous domain.
>> >
>> >> > So there are two ways we can go:
>> >> >
>> >> > - allow manual ancestor selection (restricted to heads(::x and ::y))?
>> >> > - invent a merge operator that's well-defined for multiple ancestors
>> >> >
>> >> > It's not too hard to see how the latter might work, if we ignore
>> >> > renames.
>> >>
>> >> That would indeed be nifty. I'll have to screw on the old thinking cap
>> >> and cogitate over this a bit.
>> >
>> > I'm starting to write up some design notes for this idea, which I'm
>> > calling "concensus merge".
>> >
>> > A quick measurement on the Mercurial repo shows:
>> >
>> > 1911 merges
>> > 83 with two or more merge ancestors
>> > 1 with three
>>
>> Matt,
>>
>> is there a simple way (e.g. revset) to repeat that measurement? I
>> suspect that mercurial's history is probably more linear than most,
>> given the patch based workflow, the excellent review process and the
>> high commit quality standards. The fact that there are only 2 named
>> branches probably contributes to that as well.
>>
>> I could repeat those measurements on some of our repos to give you
>> another measurement point.
>
> I did this:
>
> hg log --template '{rev}\n' -r 'merge()' > merges
> for f in `cat merges`; do echo -n "$f: "; hg log -r "heads(::p1($f) and ::p2($f))" --template "{rev} "; echo; done > merge-ancestors
>
> You can also do something like this:
>
> $ hg dbsh
> loaded repo : /home/mpm/hg
> using source: /home/mpm/hg/mercurial
>>>> d = {}
>>>> for m in repo.revs("merge()"):
> ...   d[m] = repo.revs('heads(::p1(%d) and ::p2(%d))', m, m)
> ...
>>>> len(d)
> 1911
>>>> len([x for x in d if len(d[x]) >= 2])
> 83
>
> It's actually not clear from this measurement that any of these merges
> were 'ambiguous' based on the current algorithm, which picks the first
> common ancestor furthest from root.

Umm, I am a bit surprised. I tried this on 3 of our repos. Looking at
the 3 corresponding merge-ancestors files, none of them has a line
showing more than possible 1 ancestor (if I understood what you did
properly in cases where there are more than 1 ancestor I should get a
line such as "148: 124 131", right?)

In particular, this is the data I got:

- Repo 1: 1270 revisions, 158 merges, 27 branches (16 inactive, 4 closed)
- Repo 2: 1054 revisions,  82 merges, 22 branches (11 inactive, 3 closed)
- Repo 2: 513 revisions,  41 merges, 10 branches (2 inactive, 1 closed)

In all cases the number of merges that may be ambiguous is 0.

Cheers,

Angel


More information about the Mercurial-devel mailing list