Counterintuitive tag behaviour (broken design?)

Tue Mar 13 18:26:17 CDT 2007

On Tue, Mar 13, 2007 at 10:48:55PM +0100, Georg-W. Koltermann wrote:
> Am Dienstag, den 13.03.2007, 11:54 -0500 schrieb Matt Mackall:
> > Head A                  Head B              
> > 01234567  foo -> 0      01234567 foo -> 0
> > deadcafe  foo -> 1      deadcafe foo -> 1
> >                         01234567 foo -> 2
> 
> What about this:
> 
> Head A			Head B
> 0123 foo -> 0		0123 foo -> 0
> 4567 foo -> 1		cdef foo -> 1
> 89ab foo -> 2		ff00 foo -> 2
> cdef foo -> 3
> 
> Would the final tag value be cdef or ff00?
> 
> Based on ranking it would be cdef.  But my guts feeling is that it
> should be ff00.

I would agree. Any algorithm that didn't do this would fail the 'it
just works' test.

> My feeling is based on the assumption that the user who placed the ff00
> tag value already had seen the cdef one in his view of history, and
> deliberately replaced it by ff00.
> 
> Actually we cannot be really certain that he replaced cdef when he
> assigned ff00.  It might be that he was seeing another "active" value at
> that point in time, from another head.  But that could only have been
> possible if a) that other branch had already superseded the cdef value
> even then, or b) some kind of tie-breaking had made another value than
> cdef preferable to cdef.  So however we look at it, if ff00 comes after
> cdef in some head, then at the point in time when ff00 was created it
> was prefered to cdef.  That should be reflected in picking the overall
> tag value.

And I'd agree with that too.

> (Math once was a favourite of mine at school, but now can't imagine how
> to put this more formal, more towards being mathematically provable.  It
> remains my "feeling" only.)

No, I think you've captured it. Either cdef was the preferred value
when ff00 was added, or something else superceded it. And ff00
supercedes it transitively.

> Now, what really worries me is how could merges garble up that sequence.
> 
> E.g., if we have:
> 
> Head A			Head B
> 0123 foo -> 0		0123 foo -> 0
> 4567 foo -> 1		cdef foo -> 1
> 89ab foo -> 2		ff00 foo -> 2
> 
> we get a merge conflict.  The user could resolve it by selecting lines
> two and three from head A or from head B, or combine both contributors
> in either order.  In reality, when there are various tags intermixed,
> not just the one foo tag, it will be really unintuitive the the user, it
> will not be apparent what the right resolution should be.  

You're right. With a collection of .hgtags files, we have a partial
ordering of values for foo. When we merge two, the result is a full
ordering, a > b > c > d. I don't that really matters though. What we
really want to extract is "which is the greatest?" So if we merge your
example into:

0123
4567
cdef
89ab
ff00

All that really matters is which one comes last, because that's the
one that's going matter when actually determining what the tags are.
This says ff00 > (0123 4567 cdef 89ab). In the event of a tie, we can
use the tip-most tie-breaker rule. And yes, this means I think we need
to automate .hgtags merges.

> So I'd like to say I'm not convinced that the algorithm I implemented in
> my patch is the best and final version that can be found, but it's a
> definitive improvement over the old behavior, and it doesn't require a
> data format change.  It's good enough for my needs.  I update tag values
> often, using tags somewhat like a branch name, to indicate the latest
> "blessed" version for a particular task.

It'd be useful for you to document your tags algorithm precisely,
that'll be easier than me trying to reverse engineer it.

-- 
Mathematics is the supreme nostalgia of our time.