Internal-changeset concept (was: re: [PATCH RFC] repo: add an ability to hide nodes in an appropriate way)

Fri Mar 31 20:17:12 EDT 2017

[And here is the actual reply, sorry for the initial misstep]

On 03/30/2017 07:08 PM, Ryan McElroy wrote:
> On 3/30/17 3:15 PM, Pierre-Yves David wrote:
>> […]
>> I hope this long message help to clarify various concept. We have way
>> forward to reduce the use of stripping without abusing the
>> obsolescence concept in a way that will create issue for users. These
>> way forward are in reach and would not take too long to build.
>>
>>
>
> For what it's worth, I find this essay pretty convincing: obsmarkers
> have their use-case, they are lightly abused now (for temp-amend-commit)
> but it probably makes sense to not abuse them further.
>
> It sounds like an "internal" phase above "secret" would actually map
> fairly well onto the temp-amend-commit, hidden shelves never meant to be
> exchanged, aborted rebase, etc places.
>
> Pierre-Yves, since you've thought about this a lot, does a phase here
> make sense (note that I'm not saying a general hiding place like
> "archive" phase, but just for internal nodes that we generally don't
> want shown or exchanged, but are also not "technically obsolete".

You are right. Given the property of internal changesets, phases may be 
a good pick for them.

I'll do a full review of the problem space and its possible solutions, 
if you want the short conclusion, skip to the end.

I've also created a plan page related to this:

   https://www.mercurial-scm.org/wiki/InternalsPlan

Problem Space
=============

Possible approach
-----------------

Since introducing a new "internal" concept will requires a new 
repository requirement. We have multiple options at hands.

    1) dedicated 'internal' phases,
    2) ad-hoc mechanism (let us says bytemap),
    3) key in changeset extra,

(If someone think about something else, let me know)

Internal changeset property
---------------------------

    A) They are irrelevant to the users,
    B) Nothing will be based on them,
    C) Once done with their purpose, we should not need them again,
    D) They never leave the repository,
    E) They never stop being internal changesets,

Life Cycle of Internal Changeset
---------------------------------

Let us review how we use internal changesets and when do we need them 
visible ?

amend
`````

    1) transaction is open,
    2) a temporary commit is created,
    3) <temp> is used to build the result of the amend,
    4) obsmarkers are created (between old and new),
    5) <tmp> is "archived" (currently obsmarkers),
    6) transaction is committed.

The commit is created and "archived" in the same transaction. It is 
never visible to anyone.

Actually, we could fully remove this changeset with appropriate 
in-memory-ctx capabilities.

histedit
````````

Scenario:

    pick <initial-second>
    pick <initial-first>
    roll <initial-fourth> # the important part is the fold
    pick <initial-third>

(Assuming single transaction for simplicity)

    1) transaction is open,
    2) <initial-second> is rebased on base
      (possible conflict and associated transaction open/close)
       -> creates <final-first>,
    3) <initial-first> is rebased on the <final-first>
      (possible conflict and associated transaction open/close)
       -> creates <temporary-second>,
    4) <initial-fourth> is rebased on the <temporary-second> (no commit),
      (possible conflict and associated transaction open/close)
    5) changes are folded in <temporary-second> as <final-second>
    6) <initial-third> is rebased on the <final-second>
      (possible conflict and associated transaction open/close)
       -> creates <final-third>,
    7) transaction is committed.

Possible rebase conflict during (4) will expose <temporary-second> to 
the user as:
  * part of the merge conflict
  * working directory parent.

The user needs to see that temporary commit during the merge conflict 
resolution.

shelve
``````

shelving:

    1) transaction is open,
    2) shelved change are committed,
    3) commit is recorded as a shelve-changeset,
    4) apply hiding on this shelve,
    5) transaction is committed

unshelving (I might be wrong, I've not followed everything):

    1) transaction is open
    2) uncommitted changes are put in a temporary commit
    *) access the shelve-changeset (somehow),
    3) hg rebase -r <shelve> -d <tmp>
    <) possible conflict resolution that requires transaction commit
    >) possible transaction reopen (if conflict)
    =) rebase complete
    4) grab the result of the shelve and restore working copy parent
    5) hide <tmp> and <rebased>
    6) close transaction

Note: during possible rebase conflict, the "shelve" is currently visible 
to the user. Can we change this ?

* At minimum, it needs to be at least visible to the "hg resolve 
command". This is easy to achieve in all cases.

* Having it visible seems valuable for the user experience.

conclusion in life cycle
````````````````````````

Even if many internal never needs to be visible out of a transaction 
that creates them. There seems to be valid cases were the internal 
changeset is exposed to the user to help with merge conflict (histedit 
and shelve).

In general however, once an internal changeset has been hidden, we won't 
need it again. There is a small exception with <shelved> changeset. More 
on that below.

Additional note about shelve
-----------------------------

The shelve extensions use a full call to the "rebase" command to merge 
the shelved change on the destination. It could directly use the core 
merge mechanism to perform this graft (not using the rebase extension). 
This would allow to skip the temporary commit currently created by rebase.

In addition, if I'm not mistaken, core-merge is able to merge with a 
dirty working copy. So it might be possible to directly trigger that 
"graft" without the temporary local commit. hg resolve will still use 
and display proper information. And the pre-merge content will be 
available for restoration on --abort.

That would lift the need of temporary internal changeset during 
unshelve.From there, the only internal changeset involved in shelve 
would be the shelved changeset itself.

In addition Since we keep track of shelved changeset, it will be easy to 
feed them to the hiding logic as long as they are official shelve.
However, when the shelve is deleted, we want to be able to hide 
<shelved> changeset and the internal-changeset concept is handy.

The idea of using a mechanism dedicated to shelve for hiding active 
shelve allow to lift the exception created by shelves in regard with the 
life-cycle of internal-changeset. This can also become useful if people 
starts exchanging shelve between repository (urg) as won't make another 
exception to the internal space.

Analysis of available option
============================

Lets dive deeper in the various option we have:

Phases
------

We could use new dedicated phase(s) to keep track of internal changesets.

   advantages
   ``````````

+ dedicated phases express the distinction from real changeset well,

+ implementation is already here and fast, including all the UI bits,

+ phases already have the concept of monotonous life cycle. It ishelpful 
regarding internal-life-cycle (and easy to use two phases for it)

   disadvantages
   `````````````

- Updating the phase concept means new complexity:

   * we probably want to enforce that nobody gets out of experimental, 
there is no such check on boundaries yet,

   * We either give up on using phase on anything else, or we add some 
complexity to the concept (no longer one-dimensional order)

   other
   `````

= exchange is not part of the equation, we have a large freedom 
regarding such phases,

= since internal changesets are not meant to have unexpected descendant, 
the 'rooting' of phase will not be an issue.

   summary
   ```````

Phases seems like a good option, most of the usual drawback of phases 
regarding 'hiding' are neutralized by "internal-changeset" property and 
it fits well in the concept to separate 'internal' from the other 
changesets.
The main reservation would be around the change implied to the phases 
concept.

ad-hoc solution
---------------

We could build a dedicated solution to track internal changeset and 
their life-cycle (eg: bit maps, root tracking)

   advantages
   ``````````

+ monotonous (visible → invisible) life cycle makes bit maps simpler,

+ ad-hoc concept means no pre-existing constraints,

+ ability to track life cycle if wished,

   disadvantages
   `````````````

- as feature is new, everything needs to be build from scratch, storage, 
handling, safeguard, display to user, etc.

- introduce new UI concept

   other
   `````

= This approach allow to preserve phases on the internal changesets, but 
since they are not intended for real-space of changesets we do not 
really have usage for phases with them.

   summary
   ```````

This seems a possible way to implement internal changeset. It might 
implies quite some work.

changeset data
--------------

We could use a special key in extra (eg: '_internal') to track internal 
changesets.

   advantages
   ``````````

+ consistency: makes it very clear 'internal' is a permanent property of 
the changeset (it has been created for internal use),

+ prevent-collision: 'extra' are part of the hash, making collision 
between the real-space and internal-space impossible.

   disadvantages
   `````````````

- need-caching: reading extra of all non-public changeset will be too 
slow. So we'll need a cache, building such cache is likely as 
complicated as building an adhoc solution.

- do not handle life cycle: since commits are immutable, we cannot track 
the life cycle of internal changeset with it. There is no way to convey 
the difference between the internal commit we still needs to see and the 
other. So we will have to rely on an extra mechanism here.

   other
   `````

= using extra usually helps with transferring information from a 
repository to another. This is not relevant here since internal 
changeset are not meant to be exchanged.

= This approach allow to preserve phases on the internal changesets, but 
since they are not intended for real-space of changesets we do not 
really have usage for phases with them.

   summary
   ```````

Despite a couple of interesting property, using extra for 'internal' 
will not be very adequate for the task at hand. I would requires extra 
performance work and a secondary concept to handle the life cycle.

Extra thought about life cycle
------------------------------

Their is a couple of way to handle the internal-life-cycle while 
tracking a single 'internal' "state".

* late move to internal: we can create "normal" changeset and mark them 
internal when we are done with them.

* external local-hiding mechanism: If we get a generic hiding mechanism, 
we could just track that a changeset is internal, but rely on the 
generic local-hiding mechanism for the second part of the life cycle.

Conclusion
==========

short version
-------------

At the current stage of my reflexion, my personal choice will be:

* introduce two new phases ('internal' and 'internal-archived'),

* also add a '_internal' extra key for good measure since it adds goods 
property.

* introduce a context manager to create/interact with temporary changesets

rational
--------

I'm going for phase-space because 'public/draft/secret' do not make 
sense for internal changeset anyway. Making it some explicit with an 
'internal' phase seem a good move.

In addition, we already have all the UI and concept around phase. So 
introducing a new one will not add complexity to our UI.

I go for a dedicated 'internal-archived' phase to handle visibility. We 
could use a generic local hiding mechanism but having a dedicated phases 
increase insulation. Such insulation reduce the chance of a user 
touching internal visibility by mistake. Implementation is not more 
complex since we can already feed the visibility code from multiple sources.

I keep the 'extra' key idea to make sure we'll never collide with 
'real-space'.

implementation idea
-------------------

* phase 'internal' is visible but not exchanged. checkout and merging 
with it requires special flag/mode unavailable to the user,

* similar flag/mode (probably context manager) is to be used when 
creating internal changesets,

* phase 'internal-archived' has the above property except for visibility 
(it is invisible).

* <shelved> commit stay in the 'internal' phases and a explicitly fed to 
the hiding mechanism. They move to 'internal-archived' phase when deleted,

* starting at '4', phases are no longer 'linear' and use 'bit flag' for 
property.

* phase property defined earlier still apply (eg: any phase greater than 
1 is not exchanged)

* we use 'internal-archived=32', 'internal=96' This gives us 'bit-5(32)' 
→ internal; 'bit-6(64)' → visible. And 'internal-archived' < 'internal' 
this respect the "natural" phase movement.

* we add checks so that changeset with a phase having the 'bit-5' set 
never loose it.

* we use the '_internal extra' to guarantee the lack of collision, (and 
possible safeguard.

What do you think ?

Cheers,

-- 
Pierre-Yves David