[PATCH RFC] repo: add an ability to hide nodes in an appropriate way

Ryan McElroy rm at fb.com
Thu Mar 30 13:08:36 EDT 2017


On 3/30/17 3:15 PM, Pierre-Yves David wrote:
> On 03/27/2017 07:24 PM, Augie Fackler wrote:
>> On Mon, Mar 27, 2017 at 10:27:33AM +0200, Pierre-Yves David wrote:
>>>
>>>
>>> On 03/27/2017 12:32 AM, Kostia Balytskyi wrote:
>>>> # HG changeset patch
>>>> # User Kostia Balytskyi <ikostia at fb.com>
>>>> # Date 1490567500 25200
>>>> #      Sun Mar 26 15:31:40 2017 -0700
>>>> # Node ID 43cd56f38c1ca2ad1920ed6804e1a90a5e263c0d
>>>> # Parent  c60091fa1426892552dd6c0dd4b9c49e3c3da045
>>>> repo: add an ability to hide nodes in an appropriate way
>>>>
>>>> Potentially frequent usecase is to hide nodes in the most appropriate
>>>> for current repo configuration way. Examples of things which use this
>>>> are rebase (strip or obsolete old nodes), shelve (strip of obsolete
>>>> temporary nodes) and probably others.
>>>> Jun Wu had an idea of having one place in core where this 
>>>> functionality
>>>> is implemented so that others can utilize it. This patch
>>>> implements this idea.
>>>
>>> I do not think this is a step in the right direction, creating 
>>> obsolescence
>>> marker is not just about hiding changeset.
>>
>> But we're using them for that today, all over the place.
>
> Actually, we do not use it all over the place. Current markers usages 
> in core are:
>
> amend:
>  - record evolution history
>  - prune the temporary amend commit
>   (that I would rather see disappear all together)
>
> rebase:
>  - record evolution history on successful completion
>
> histedit:
>  - record evolution history on successful completion
>  - prune temporary nodes on successsful completion
>    (This ones used to be stripped even with evolution)
>
> (let us talk about shelve later)
>
>
> So they are mainly used to record evolution history on successful 
> operation (their intended usage) between "real-changesets". This is 
> the intended usage of obsolescence markers.
>
> In addition, they are also used in a couple of place to hide temporary 
> changesets (in the "utility-space") after an operation succeeed.
> This a bit an unorthodox uses of the obsolescence markers, but is 
> "okay" to use them in these case because we know:
>  1) this is utility-space so user never interact with them.
>  2) we only create them for completed successful operation so we'll 
> never need them ever again.
> Strictly speaking we could just strip these temporary commits (and we 
> have done so in the past) and this will not change anything for the 
> user. Any other hiding mechanism aimed at "internal temporary" commit 
> would do the job just fine. The internal commit never needs (and 
> actually should ever) to leave the user repository.
> In practice, we could use in memory commit for most of these temporary 
> commit  (if not all) and never have to deal with hiding them.
>
>> We're also having pretty productive (I think!) discussions about 
>> non-obsmarker
>> based mechanisms for hiding things that are implementation details of
>> a sort (amend changesets that get folded immediately, shelve changes,
>> etc).
>
> (Let me be at be long to try to be clear and comprehensive)
>
> I can see three categories of things we want to hide:
>
> :obsolete changeset (evolution semantic):
>
>   A rewrite operation has created/recorded a new version of this 
> changesets. that old version no longer needs to be shown to users (if 
> possible). There is a strong semantic associated with such changesets 
> and the property will be propagated to other repositories
>
> :temporary/internal changesets:
>
>   Such changesets are created as a side effect of other operation 
> (amend, histedit, etc). They have the following properties (once we 
> are done using them)
>
>   * They are irrelevant to the users,
>   * We should never-ever base anything on them,
>   * We should never-ever access them ever again,
>   * They should never-ever leave the repository.
>   * They never stop being internal-changesets
>
> :locally-hidden changeset:
>
>   An hypothetically (not used) yet local-only hidden mechanism 
> (similar to strip but without actual repo stripping). This could be 
> used for local hiding/masking of changeset in the "real-changeset" 
> space. Such data would not be exchanged. Commit reappears when 
> "recreated" or re-pulled.
>
> ---
>
> To take practical examples:
>
> :amend:
>   - amend source: "obsolete-changeset"
>   - temporary changeset: "internal-changeset"
>
> :rebase:
>   - rebase source, on success: "obsolete-changeset"
>   - rebase result, on abort:  "local-hiding"
>
>
> :histedit:
>   - histedit source, on success:         "obsolete-changeset"
>   - histedit temporary node, on success: "internal-changeset"
>   - histedit result, on abort:           "local-hiding"
>   - histedit temporary node, on abort:   "local-hiding (internal-node)"
>
> extra notes:
>  * If local hiding (strip) would take care of obsolescence marker, 
> rebase and histedit could create them "as they go" instead of to the 
> end: "on success".
>  * In rebase --abort and histedit --abort, strip is used on freshly 
> created changesets, so its performance impact is limited
>  * we use "local hiding" (strip) on temporary nodes on histedit abort 
> because we needs to be able to reuse them if histedit is rerun. So we 
> cannot throw them in oblivion just yet.
>
> ---
>
> regarding implementations
>
> "obsolete-changesets": we already have a solid mechanism for that.
>
> "internal-changesets"" has very constrained property. it allow us to 
> abuse obsolescence markers for them.
> It would be quite easy and fast to build an ad-hoc mechanism for them 
> and stop abusing obsmarkers. (for example a simple bitmaps or maybe 
> some "internal-*" phases. Having a dedicated tacking of internal 
> changesets helps enforcing the their property and simplify the 
> obsmarker space.
>
> "locally-hidden-changesets": this is trickier. For now the only usable 
> mechanism we have for them is stripping. We cannot use obsolescence 
> marker for them since obsolescence markers are building a global 
> persistent and distributed stated and we are explicitly aims at 
> staying light and local in this case (I can elaborate on that last bit 
> if you think it is needed). Having or not having node-version does not 
> make a difference here. locally-hiding do not want to have any global 
> side effect and creating obsmarkers will always results in global side 
> effect.
>
> Having local-hidding mechanism is actually not too far away, it 
> requires a couple of things:
> 1) applying the hiding,
> 2) storing the data,
> 3) automatic lifting of the hiding.
>
> (1) is already here for free. As explained during the sprint, the API 
> for hidden things is already flexible enough from the start to be fed 
> with other mechanism than just obsolescence.
> (2) should not be too complicated, either a bitmap approach or a root 
> based approached (bitmap is probably more solid).
> (3) probably already exist in Jun "archived" experiment. We have to 
> make sure recommit/repulling a locally-hidden changeset makes it 
> available again.
>
> Overall, this is not too much work.
>
> One extra thing to take care of is obsmarkers. When we strip a 
> changesets we want to be able to strip directly related obsmarker. 
> Strip is not currently able to do so, but it will be able to do so 
> soon and we have a clear technical path here. So non-destructive 
> "hiding" of changeset would needs its counterpart for obsmarkers 
> before we can use it in case where obsmarkers are involved..
>
>>> It is about recording the
>>> evolution of time of changesets (predecessor ← [successors]). This 
>>> data is
>>> used in multiple place for advance logic, exchanged across 
>>> repository etc.
>>> Being too heavy handed on stripping will have problematic 
>>> consequence for
>>> the user experiences. Having an easy interfaces to do such heavy handed
>>> pruning will encourage developer do to such mistake.
>>
>> I'm struggling a bit to understand your argument here. I think your
>> argument is this: hiding changes can be done for many reasons, and
>> having a simple "just hide this I don't care how" API means people
>> won't think before they do the hiding. Is that an accurate
>> simplification and restating of your position here? (If no, please try
>> and restate with more nuance.)
>
> My point go a bit further than that. Let me restate:
>
> * The purpose of obsolescence marker is to (globally) track the 
> evolution of changesets through rewrite,
>
> * The fact obsolete changesets get hidden is just an (intended) 
> consequence of this tracking,
>
> * Creation of new obsolescence markers should be done after careful 
> consideration and weighting.[1]
>
> This current proposal goes in the wrong direction in regards with the 
> above. Creating a shinny "This hides nodes" hammer that use obsmarker 
> underneath is misleading. It will encourage developer to create 
> obsolescence markers in case where it is inappropriate to do so. 
> Because "hide" is the wrong semantic for markers.
>
> On a general basis, a function creating markers without taking a 
> (predecessors → successors) mapping is a strong hit that this function 
> not approaching obsolescence markers in an appropriate way.
>
> [1] eg: we allow ourself to use obsolescence marker for some internal 
> changeset because we have carefully though about it and we decided it 
> won't be problematic. I would rather not do it at all.
>
>>> The fact this function only takes a list of nodes (instead of (pec, 
>>> suc)
>>> mapping) is a good example of such simplification. Series from Jun 
>>> about
>>> histedit is another. Obsolescence markers are not meant as a generic
>>> temporary hidden mechanism for local/temporary nodes and should not 
>>> be used
>>> that way.
>>
>> In broad strokes, I agree here. Have we made any conclusive progress
>> on a mechanism for archiving changes? The reason we're seeing all
>> these "abuses" of obsolete markers is that they're currently the only
>> append-only means by which things can be discarded, and there are (as
>> you know) ugly cache implications to running strip.
>
> Strip is far from ideal. However there are multiple mitigation 
> mechanism that exist today to limit its impact on cache and there is 
> many low hanging fruit to improves this mitigation.
>
> In addition, changeset filtering currently also have some bad impact 
> on cache, so they do not simply goes away if we stop using strip. 
> Their are also many low hanging fruits to mitigate this impact.
>
>> (I'm sort of +0 on this series, but I'd be more enthusiastic about
>> having a proper hiding mechanism in place that isn't obsmarkers before
>> we go too far down this road.)
>
> To summarize my rational above, the proposed function is misusing the 
> obsolescence markers semantic and will leads to problematic usages of 
> them. So I'm -1 for it.
>
> I'm not just bringing this out of potential fear. In the past years 
> I've seen (and prevented) people to misuse obsolescence marker for 
> histedit-abort, rebase-abort, and histedit-abort again, etc… (we just 
> fixed a regression as this thread was starting).
>
> I hope this long message help to clarify various concept. We have way 
> forward to reduce the use of stripping without abusing the 
> obsolescence concept in a way that will create issue for users. These 
> way forward are in reach and would not take too long to build.
>
>

For what it's worth, I find this essay pretty convincing: obsmarkers 
have their use-case, they are lightly abused now (for temp-amend-commit) 
but it probably makes sense to not abuse them further.

It sounds like an "internal" phase above "secret" would actually map 
fairly well onto the temp-amend-commit, hidden shelves never meant to be 
exchanged, aborted rebase, etc places.

Pierre-Yves, since you've thought about this a lot, does a phase here 
make sense (note that I'm not saying a general hiding place like 
"archive" phase, but just for internal nodes that we generally don't 
want shown or exchanged, but are also not "technically obsolete".


More information about the Mercurial-devel mailing list