RFC: bitmap storage for hidden

Durham Goode durham at fb.com
Wed Aug 31 20:53:17 UTC 2016


One of the performance costs that affects every command is the 
computehidden function (which computes what commits are hidden based on 
a combination of obsmarkers, bookmarks, workingcopy, phases, and tags).  
At Facebook this function alone can add 300ms to every command a user 
runs (depending on a variety of factors).

I've begun work on a storage structure for this information, so we don't 
need to recompute it during every read command.  Before I go through the 
work to polish it enough to send it upstream, I want to run it by people 
for comments.

The idea is basically:

- Have a file that acts as a bitmap, where a changelog rev number maps 
to a particular bit, and if that bit is set then the node is considered 
hidden.  computehidden will return a class that uses this to answer 
filteredrevs.__contains__ calls.

- When something is changed that would affect visibility (bookmark edit, 
phase change, obsmarker creation, dirstate movement, tag edit), that 
code is responsible for calling hidden.updatevisibilty(repo, 
affectednodes) which reports which nodes are touched by the edit (like 
the old bookmark node and the new bookmark node), and updatevisibility() 
does the minimal calculation of visibility changes and propagates those 
changes to the node ancestors as required.

- The file would be stored in .hg/cache/hidden, and would only be 122kb 
per million commits, so we'd just use the normal copy + atomic rename 
strategy for transactionality (i.e. transaction.addfilegenerator)

- In theory the file would always be up-to-date, but we can put some 
metadata at the front of the file to allow verifying the repo hasn't 
changed significantly out from under it (recording tip, working copy 
parents, obsmarker file timestamp, etc). If the repo has changed, or if 
the bitmap doesn't yet exist, it can be trivially calculated using 
hidden.updatevisibility(repo.revs('not public()')) in a manner similar 
to how it works today.


Notable issues:

- A number of places treat filteredrevs as a set, and do things like 
'myset - repo.filteredrevs'.  With a bitmap this doesn't work, so we 
need to translate it to '(r for r in myset if r not in 
repo.filteredrevs)'.  Which is probably a better O() anyway since it 
won't be affected by the size of filteredrevs.

- filteredrevs is currently a frozen set.  Editing the bitmap will have 
to be done in a way where the edits only become visible when the 
'frozen' set is invalidated.  So we maintain the existing behavior.


Follow up ideas:

- Once this ships as a cache, it might be possible to progress it to 
not-a-cache, such that we could define visibility in other terms.  For 
instance, we could allow hiding/reviving commits without obsolescence 
markers in non-evolve repos.  This is a bit more of a long term thought, 
but might be worth brainstorming about.


My code so far can be seen at:

https://bitbucket.org/DurhamG/hg/commits/branch/bitmap

Though it's currently missing some key features (cache validity 
checking, tests don't all pass yet, file is in .hg/store instead of 
.hg/cache).



More information about the Mercurial-devel mailing list