Topics [was: news from the topic experiment]

Thu Oct 13 10:01:25 EDT 2016

On 10/13/2016 07:33 AM, Erik van Zijst wrote:
> Working on our Bitbucket spike I wondered if topics could perhaps
> benefit from a small simplification. Instead of adding the topic name
> as an additional field, what if we defined a topic commit by merely
> adding a boolean property to the meta dict; e.g. `is_topic: True`?
> Named branches would not have the property set.

As a foreword, I agree the frontier between named branch and topics is 
thin. The difference between them are useful but if we can make their 
management really close, that would be useful. If we did not had the 
backward compatibility constrains, I would happily have `hg branch foo` 
create a "topics" and something like `hg branch --long-lived` create 
what currently is a named branched.

So, thanks for exploring possibilities to make this frontier thiner. 
However, I can see some issues with some aspects of this proposal, using 
the same field for either branch or topic make them mostly mutually 
exclusive. Publishing a topic on a named branch would require to alter 
the changesets data (and therefore, hash) This means people could use 
topic only with the default branch (or significant more complexity to 
work around this). As I understand, Bitbucket enforces a single master 
branch so that might actually fit your model. This is probably too 
restricting for the general project (more about that later).

> It would seem to me that this could have some benefits:
>
> 1. There would be no risk of a name collision between the branch and
> topic namespaces.

I'm not certain this actually avoid the risk of name collision. People 
could use the same branch/topic name on different changesets with 
different values for the flag. That would lead to both a topic and a 
named branch to exists with the same name.

In all cases we should have the local UI fight hard to prevent people to 
create collisions between branch and topic. (And some descent way to 
point out name conflict if they appends).

> During our little Bitbucket spike and demo during the sprint we made
> the assumption that topics and branches should never clash, which
> allowed me to put them in the existing, single branches namespace.
> This would seem desirable as topics are essentially branches, with the
> only real difference being their anticipated longevity. This greatly
> simplified the UI as hardly anything needed to be modified.

That is a interesting though.

I think there is some value in having the option to use a (branch, 
topic) pairs (more about this later), However, the same as people are 
not really exposed to named branch until they get out of default, a UI 
could still omit all named branch informations as long as nothing else 
than "default" exists.

> 2. Interoperability with non-topic clients would mostly just work.
>
> Currently, when cloning a topics repo with an old client, all topics
> would appear as an amorphous mess of anonymous heads on default.
> Instead, if we dropped the separate topic name field and just used the
> branch name as the topic name, an old client would see the same layout
> as a topic-enabled client and while an old client would not be able to
> create new topic commits, read-only clients should be totally fine.
> This could be a big boon for existing ecosystem tools like CI servers
> that wouldn't have to be modified.

There is some very interesting ramification to this proposal. Even if we 
do not go with the flag approach. We store the full (branch, topic) pair 
into the branch field. For example we could use ":" as a separator 
branch=BRANCH:TOPIC. Not only this would allow old clients to transport 
the data (we already have that) but this also mean old client can also 
view and preserve that data (and this is new) even if it does not get 
the behavior improvement related to topic. That would be a large 
usability boost for old client.

This is a great lead thank you very much.

> The only downside I can think of is that when the topic and original
> branch name are separate fields, that a topic sort of remains
> associated with the branch it was based on. This would provide
> implicit merge and rebase targets and therefore slightly shorter
> commands. However, I'm not sure that's worth giving up the above
> points.

I think having the (named, topics) pair is really useful, especially we 
can expect great gains from sensible and clear default (merge, rebase, 
behind computation, etc). In addition, as we can keep hiding the named 
branch concept for all users who do not needs it. I think we can have a 
good trade-off regarding extra-feature vs extra-complexity while keeping 
an initial complexity similar to not having named branch at all.
(But as usual, I'm open to be convinced that the right trade-off is 
somewhere else)

We need to explore a bit more the consequence of having the same topic 
on multiple branches, but I'm not too worried we can eventually defines 
some good behavior+constrains pairs that makes this working.

> I do realize it's quite possible I'm overlooking other reasons for
> having topics encoded as an additional namespace, separate from the
> named branch name.

That email was very useful, please send more of them ☺♥

Cheers,

-- 
Pierre-Yves David