[PATCH 5 of 6] hidden: add hiddenset and updatevisibility logic

Durham Goode durham at fb.com
Mon May 22 18:38:07 EDT 2017


On 5/22/17 2:38 AM, Pierre-Yves David wrote:
> On 05/19/2017 05:25 AM, Gregory Szorc wrote:
>> On Thu, May 18, 2017 at 11:23 AM, Durham Goode <durham at fb.com
>> <mailto:durham at fb.com>> wrote:
>>
>>     # HG changeset patch
>>     # User Durham Goode <durham at fb.com <mailto:durham at fb.com>>
>>     # Date 1495129620 25200
>>     #      Thu May 18 10:47:00 2017 -0700
>>     # Node ID 0979ce9d40803103441b6df1ebefafe80bb9a057
>>     # Parent  2aef3c4be6ba81e9d2453760515517df6fc2b816
>>     hidden: add hiddenset and updatevisibility logic
>
> TL;DR; This series seems to try to address multiple distinct problem.
> Multiple of them having much simpler solution (some already available).
> I think we should clarify our goals here. The result will probably be to
> focus the effort on adding a simple "local-hidding" mechanism using the
> existing hiding API.
>
> --
>
> [longer reply]
>
> Having looked at the full series, I've the feeling that there are many
> distinct topic that this new class tries to address. I've gathered the
> following:
>
>  * hidden API usable by extension,
>  * speed improvement (using write time computation),
>  * an API to get notified of relevant changes,
>  * Local hiding solution,
>  * unified storage for hiding.
>
> There might be other I missed. Durham can you clarify your goals if needed?

The overall goal here is to refactor hiding to be in an isolated part of 
the code base, so we can reason about it and iterate on it more easily. 
All the goals you list become easier to implement if we have a good 
abstraction for hiding.

Part of the internal goal of this change is to put pieces in core that 
would let us do hash preservation within Facebook, so we could get rid 
of the inhibit extension.

> Trying to tackle that many different problem at the same time worries
> me. Addressing each of them independently should be simpler, in
> addition, solving issue at a lower level bring benefit to other parts of
> the code. For example, skipping obs-store loading just needed a straight
> forward cache, easy to update iteratively. Having that cache in now
> speeding up all operations that rely on the "obsolete-set" information,
> and the cache logic can be reused for other related data.

> My short opinion on all this topics are:
>
>  * API: extensible API already exists so I'm a bit confused,
>  * Speed: simpler solutions in review,
>  * notification: seems redundant with 'tr.changes' and less robust.
>  * local-hiding: Good idea, go for it (consider using ranges, not roots)
>  * unifying storage: seems unnecessary and dangerous.
>
> Longer opinion available below. I've also included technical comment in
> the patch itself.
>
> API for extensions
> ------------------
>
> Quoting durham email:
>
>> This has the nice property that all logic for propagating hiddenness is
>> contained to one function, and all the consumers just have to report what
>> commits they touched. If an external extension wants to modify how
>> hiding is
>> computed, it can just wrap this one function.
>
> I'm a bit confused about that part of your description, because the
> current implementation offer a small surface API already.
>
> There are two functions "hideablerevs" and "_getdynamicblockers"[1].
>
> * "hideablerevs": return the revision we would like to see hidden
>   (eg: obsolete revision)
>
> * "_getdynamicblockers": return revision that must be visible
>   (eg: bookmarks)
>
> Extensions can already overwrite these two functions and augment/alter
> their return to get any wished behavior.

The piece that is missing from the current API is infrastructure to 
listen for changes that affect hiding at write time. This includes being 
notified of a change, as well as having apis available to incrementally 
update hiding during that change and a pluggable place to store the new 
hidden computation results.

This series doesn't remove computehidden() as an extension point, it 
just provides the ability to hook into hiding at more locations (write 
time and storage time, in addition to the existing read time).

> Speed
> -----
>
> The current approach for computing hidden can scale well. Actually, I've
> two series on the list that speed it up down to 1ms order of magnitude.
> The first one[2] is adding a small cache to retrieve the set of obsolete
> changeset without loading the obsstore. The second one[3] is updating
> the old computehidden code to only performs O(len(visible-mutable))
> python operations.
>
> If our goal is to provide local-only hiding, we can get there faster,
> with simpler code, by simply feeding the "hideablerevs" function.
>

I eventually want every read operation to be O(number of changes you're 
asking about).  If I run 'hg log -r .' and we go perform ancestor 
queries on commits that are two months old, then that's too slow. We're 
at the point where we're beginning to think about how to avoid loading 
even parts of the changelog .i during standard read operations.

This series doesn't get us to O(1) hidden checks, but it puts in place 
the hidden abstraction that would let us eventually get there. I don't 
think we should have to solve O(1) lookups for all the inputs to hidden 
before we can get O(1) hidden lookup time.

> Notification API
> ----------------
>
> There are a similar efforts already started around the
> 'transactions.changes' dictionary. It aims at tracking all changes
> happening within a transaction. If we needs to track changes, I
> recommend reusing that logic instead of duplicating the effort (adding
> more odds for bugs and maintenance load).
>
> The 'tr.changes' approach the low level function adding data recording
> these changes. This is lower level than the approach in this patch and
> will be more robust as We need to update the one function adding data,
> instead of all the consumer of that function. We just fixed a bug from
> this lack of collaboration last week[3], so it seems important to keep
> it simple here.

tr.changes might be the appropriate spot for this. I can definitely look 
into that as a replacement for hidden.updatevisibility().

> Local hiding solution
> ---------------------
>
> I think this is the most important goal here. As pointed before, we
> could focus on a dedicated storage and rely on existing compute-hidden
> mechanism.

While local hiding is an important goal, I think it's somewhat 
tangential to this series.  Local hiding could be done in the way you 
propose, and could be done using my new apis as well. This series just 
gives us some more options for when we get to that point.

> unified storage for hiding
> --------------------------
>
> Given we can extract and compute hidden information from obsolescence in
> the 1ms range, I don't see the advantage of unifying the storage. On the
> contrary, hidden properties from obsolescence is a critical part of
> changeset-evolution. Mixing the storage will makes it harder to
> guarantee this property. So better keeping them separated.

As we've seen, there are differing opinions on the future of hiding 
commits in Mercurial and this patch opens doors to some options without 
breaking evolve. Having this conceptual distinction in core will help 
give people more insight into and confidence in the various options 
around hiding so we can have a more productive discussion next time.


More information about the Mercurial-devel mailing list