[PATCH RFC] repo: add an ability to hide nodes in an appropriate way

Thu Mar 30 10:15:08 EDT 2017

On 03/27/2017 07:24 PM, Augie Fackler wrote:
> On Mon, Mar 27, 2017 at 10:27:33AM +0200, Pierre-Yves David wrote:
>>
>>
>> On 03/27/2017 12:32 AM, Kostia Balytskyi wrote:
>>> # HG changeset patch
>>> # User Kostia Balytskyi <ikostia at fb.com>
>>> # Date 1490567500 25200
>>> #      Sun Mar 26 15:31:40 2017 -0700
>>> # Node ID 43cd56f38c1ca2ad1920ed6804e1a90a5e263c0d
>>> # Parent  c60091fa1426892552dd6c0dd4b9c49e3c3da045
>>> repo: add an ability to hide nodes in an appropriate way
>>>
>>> Potentially frequent usecase is to hide nodes in the most appropriate
>>> for current repo configuration way. Examples of things which use this
>>> are rebase (strip or obsolete old nodes), shelve (strip of obsolete
>>> temporary nodes) and probably others.
>>> Jun Wu had an idea of having one place in core where this functionality
>>> is implemented so that others can utilize it. This patch
>>> implements this idea.
>>
>> I do not think this is a step in the right direction, creating obsolescence
>> marker is not just about hiding changeset.
>
> But we're using them for that today, all over the place.

Actually, we do not use it all over the place. Current markers usages in 
core are:

amend:
  - record evolution history
  - prune the temporary amend commit
   (that I would rather see disappear all together)

rebase:
  - record evolution history on successful completion

histedit:
  - record evolution history on successful completion
  - prune temporary nodes on successsful completion
    (This ones used to be stripped even with evolution)

(let us talk about shelve later)

So they are mainly used to record evolution history on successful 
operation (their intended usage) between "real-changesets". This is the 
intended usage of obsolescence markers.

In addition, they are also used in a couple of place to hide temporary 
changesets (in the "utility-space") after an operation succeeed.
This a bit an unorthodox uses of the obsolescence markers, but is "okay" 
to use them in these case because we know:
  1) this is utility-space so user never interact with them.
  2) we only create them for completed successful operation so we'll 
never need them ever again.
Strictly speaking we could just strip these temporary commits (and we 
have done so in the past) and this will not change anything for the 
user. Any other hiding mechanism aimed at "internal temporary" commit 
would do the job just fine. The internal commit never needs (and 
actually should ever) to leave the user repository.
In practice, we could use in memory commit for most of these temporary 
commit  (if not all) and never have to deal with hiding them.

> We're also having pretty productive (I think!) discussions about non-obsmarker
> based mechanisms for hiding things that are implementation details of
> a sort (amend changesets that get folded immediately, shelve changes,
> etc).

(Let me be at be long to try to be clear and comprehensive)

I can see three categories of things we want to hide:

:obsolete changeset (evolution semantic):

   A rewrite operation has created/recorded a new version of this 
changesets. that old version no longer needs to be shown to users (if 
possible). There is a strong semantic associated with such changesets 
and the property will be propagated to other repositories

:temporary/internal changesets:

   Such changesets are created as a side effect of other operation 
(amend, histedit, etc). They have the following properties (once we are 
done using them)

   * They are irrelevant to the users,
   * We should never-ever base anything on them,
   * We should never-ever access them ever again,
   * They should never-ever leave the repository.
   * They never stop being internal-changesets

:locally-hidden changeset:

   An hypothetically (not used) yet local-only hidden mechanism (similar 
to strip but without actual repo stripping). This could be used for 
local hiding/masking of changeset in the "real-changeset" space. Such 
data would not be exchanged. Commit reappears when "recreated" or re-pulled.

---

To take practical examples:

:amend:
   - amend source: "obsolete-changeset"
   - temporary changeset: "internal-changeset"

:rebase:
   - rebase source, on success: "obsolete-changeset"
   - rebase result, on abort:  "local-hiding"

:histedit:
   - histedit source, on success:         "obsolete-changeset"
   - histedit temporary node, on success: "internal-changeset"
   - histedit result, on abort:           "local-hiding"
   - histedit temporary node, on abort:   "local-hiding (internal-node)"

extra notes:
  * If local hiding (strip) would take care of obsolescence marker, 
rebase and histedit could create them "as they go" instead of to the 
end: "on success".
  * In rebase --abort and histedit --abort, strip is used on freshly 
created changesets, so its performance impact is limited
  * we use "local hiding" (strip) on temporary nodes on histedit abort 
because we needs to be able to reuse them if histedit is rerun. So we 
cannot throw them in oblivion just yet.

---

regarding implementations

"obsolete-changesets": we already have a solid mechanism for that.

"internal-changesets"" has very constrained property. it allow us to 
abuse obsolescence markers for them.
It would be quite easy and fast to build an ad-hoc mechanism for them 
and stop abusing obsmarkers. (for example a simple bitmaps or maybe some 
"internal-*" phases. Having a dedicated tacking of internal changesets 
helps enforcing the their property and simplify the obsmarker space.

"locally-hidden-changesets": this is trickier. For now the only usable 
mechanism we have for them is stripping. We cannot use obsolescence 
marker for them since obsolescence markers are building a global 
persistent and distributed stated and we are explicitly aims at staying 
light and local in this case (I can elaborate on that last bit if you 
think it is needed). Having or not having node-version does not make a 
difference here. locally-hiding do not want to have any global side 
effect and creating obsmarkers will always results in global side effect.

Having local-hidding mechanism is actually not too far away, it requires 
a couple of things:
1) applying the hiding,
2) storing the data,
3) automatic lifting of the hiding.

(1) is already here for free. As explained during the sprint, the API 
for hidden things is already flexible enough from the start to be fed 
with other mechanism than just obsolescence.
(2) should not be too complicated, either a bitmap approach or a root 
based approached (bitmap is probably more solid).
(3) probably already exist in Jun "archived" experiment. We have to make 
sure recommit/repulling a locally-hidden changeset makes it available again.

Overall, this is not too much work.

One extra thing to take care of is obsmarkers. When we strip a 
changesets we want to be able to strip directly related obsmarker. Strip 
is not currently able to do so, but it will be able to do so soon and we 
have a clear technical path here. So non-destructive "hiding" of 
changeset would needs its counterpart for obsmarkers before we can use 
it in case where obsmarkers are involved..

>> It is about recording the
>> evolution of time of changesets (predecessor ← [successors]). This data is
>> used in multiple place for advance logic, exchanged across repository etc.
>> Being too heavy handed on stripping will have problematic consequence for
>> the user experiences. Having an easy interfaces to do such heavy handed
>> pruning will encourage developer do to such mistake.
>
> I'm struggling a bit to understand your argument here. I think your
> argument is this: hiding changes can be done for many reasons, and
> having a simple "just hide this I don't care how" API means people
> won't think before they do the hiding. Is that an accurate
> simplification and restating of your position here? (If no, please try
> and restate with more nuance.)

My point go a bit further than that. Let me restate:

* The purpose of obsolescence marker is to (globally) track the 
evolution of changesets through rewrite,

* The fact obsolete changesets get hidden is just an (intended) 
consequence of this tracking,

* Creation of new obsolescence markers should be done after careful 
consideration and weighting.[1]

This current proposal goes in the wrong direction in regards with the 
above. Creating a shinny "This hides nodes" hammer that use obsmarker 
underneath is misleading. It will encourage developer to create 
obsolescence markers in case where it is inappropriate to do so. Because 
"hide" is the wrong semantic for markers.

On a general basis, a function creating markers without taking a 
(predecessors → successors) mapping is a strong hit that this function 
not approaching obsolescence markers in an appropriate way.

[1] eg: we allow ourself to use obsolescence marker for some internal 
changeset because we have carefully though about it and we decided it 
won't be problematic. I would rather not do it at all.

>> The fact this function only takes a list of nodes (instead of (pec, suc)
>> mapping) is a good example of such simplification. Series from Jun about
>> histedit is another. Obsolescence markers are not meant as a generic
>> temporary hidden mechanism for local/temporary nodes and should not be used
>> that way.
>
> In broad strokes, I agree here. Have we made any conclusive progress
> on a mechanism for archiving changes? The reason we're seeing all
> these "abuses" of obsolete markers is that they're currently the only
> append-only means by which things can be discarded, and there are (as
> you know) ugly cache implications to running strip.

Strip is far from ideal. However there are multiple mitigation mechanism 
that exist today to limit its impact on cache and there is many low 
hanging fruit to improves this mitigation.

In addition, changeset filtering currently also have some bad impact on 
cache, so they do not simply goes away if we stop using strip. Their are 
also many low hanging fruits to mitigate this impact.

> (I'm sort of +0 on this series, but I'd be more enthusiastic about
> having a proper hiding mechanism in place that isn't obsmarkers before
> we go too far down this road.)

To summarize my rational above, the proposed function is misusing the 
obsolescence markers semantic and will leads to problematic usages of 
them. So I'm -1 for it.

I'm not just bringing this out of potential fear. In the past years I've 
seen (and prevented) people to misuse obsolescence marker for 
histedit-abort, rebase-abort, and histedit-abort again, etc… (we just 
fixed a regression as this thread was starting).

I hope this long message help to clarify various concept. We have way 
forward to reduce the use of stripping without abusing the obsolescence 
concept in a way that will create issue for users. These way forward are 
in reach and would not take too long to build.

Cheers,

-- 
Pierre-Yves David