[PATCH] merge: add automatic tag merge algorithm

Angel Ezquerra angel.ezquerra at gmail.com
Sat Feb 22 20:19:36 CST 2014

On Sat, Feb 22, 2014 at 5:59 PM, Mads Kiilerich <mads at kiilerich.com> wrote:
> On 02/19/2014 12:28 AM, Angel Ezquerra wrote:
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra at gmail.com>
>> # Date 1392597815 -3600
>> #      Mon Feb 17 01:43:35 2014 +0100
>> # Node ID f3eb8304d9bb59e78b50b42a9341a2063e1cb451
>> # Parent  7648e9aef6eeab00a0946e877690e94fb12d389b
>> merge: add automatic tag merge algorithm
>> Try to automatically merge conflicting .hgtags files using the following
>> algorithm:
>> - keep all the tags that are on only one of the merge parents
>> - out of the common tags:
>>      - keep those that are exactly the same (including history) in both
>> parents
>>      - keep those whose "full tag history" (i.e. current node plus tag
>> history)
>>      is a super-set of the other merge parent "full tag history"
>> - If there are no tags left the merge was successful.
>> - If there are any tags left there was a conflict and the automated merge
>> failed
>>      - When this happens, fall back to a regular file merge, which in most
>> cases
>>      should open a merge tool (at which point the user can manually
>> resolve the
>>      conflicts).
>> # Notes:
>> - The algorithm assumes that tags are never removed from the .hgtags file.
>> This
>> should be true in most cases (particularly if the user does not normally
>> modify
>> the .hgtags file on its own). We could improve this merge algorithm to
>> handle
>> that case as well.
>> - When the automated merge algorithm is successful, the merged tags are
>> written
>> in alphabetical order. There main reason is that tags._readtags returns a
>> regular unordered python dict. We could change it to a sorteddict instead.
>> If we keep it this way we should probably change the hg tag command so
>> that it
>> sorts tags when it modifies the .hgtags file as well.
>> - All tests pass without modification. The added test passes as well
>> (while it
>> would fail without this change)
> I think there are some essential disadvantages with this approach:
> * It will completely rewrite the .hgtags, giving a messy history and making
> it almost impossible to merge manually with old un-sorted .hgtags.
> * Manual fixing of .hgtags will no longer be possible ... and it will
> regress in handling of cases where it has been done.
> I think it would be better if the algorithm started out by checking that
> both parents starts out as the ancestor and only have been appended to. If
> that is the case your new checks don't show any conflicts, it should take
> the new changes from one side and apply that on top of the other. That would
> be less intrusive and mostly address my concerns.
> I have clarified http://mercurial.selenic.com/wiki/TagDesign a bit - it
> should be updated if a patch like this is accepted.
> The tests should make sure to cover tricky cases like divergent change and
> removal of tags and that it is handled 'safely'.
> /Mads


thanks a lot for your review and your comments.

Sorting the tags when writing them is not strictly necessary. I was
expecting to get negative comments on that particular detail, but I
wanted to get a feel about the general direction of the patch.

The main reason why I sorted the merged tags when writing them to the
.hgtags file is that the _readtags method in tags.py loads tags into a
regular, unsorted python dictionary. Thus it does not preserve the
original tag order. There are several ways to work around this. I
think the best one would be to change tags._readtags to use a sortdict
instead of a regular dict. The sortdict class is currently defined in
config.py so perhaps it would be best to first move it to its own
separate file if we did so.

If we used a sortdict there would be no need to sort the merged tags
when saving them. I don't think there would be any serious performance
impact of doing so thanks to all the tag catching that we do.

I have reworked this patch into a patch series in which I first move
sortdict into its own file, then I use this sortdict in tags._readtags
and finally I introduce the automated merge, with the key difference
that there is no need to sort the tags after a successful merge
anymore. I will send this V2 patch series soon (unless you give me
some new comments before I do).

A nice side benefit of this change would be that we could also change
(and improve!) the way the tag command reads and writes tags. We could
reuse the tags._readtags function and create a a new writetags
function as well. Currently the tag command is "append only" in the
sense that it just adds new entries at the end of the .hgtags file.
This has several undesirable consequences:

- It makes merge conflicts very common (this is what this patch we are
discussing addresses) so in a sense this is moot.
- It creates duplicate entries in the .hgtags file which make manually
merging .hgtags files very confusing. For example, if you first add a
tag A, then add a tag B and finally move the tag A the final .hgtags
file will look something like this:


This is quite weird and hard to understand. Instead it would be best
(IMHO) if the result were:


This would be easier to understand, and easier to merge.

I think this would be an easy (and desirable) fix. I also think that
it would be backwards compatible. The only drawback that I can think
of is that we would need to write the whole .hgtags file when tagging.
I doubt this would be a problem even with a huge number of tags
(specially since tagging is not such a common operation anyway (maybe
at Facebook's scale this would be a problem though?). I will write a
patch for this and add it to my V2 patch series of this patch, unless
you think it is a bad idea.

Also, I agree that I need to add a few more tests to check the cases
in which the merge algorithm is expected to succeed and fail.



More information about the Mercurial-devel mailing list