RFC: bitmap storage for precursors and phases

Stanislau Hlebik stash at fb.com
Fri Feb 17 16:09:59 EST 2017


Excerpts from Jun Wu's message of 2017-02-17 10:30:45 -0800:
> Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > As I said before we will load all non-public revs in one set and all
> 
> The problem is, loading a Python set from disk is O(size-of-the-set).
> 
> Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> we want bitmap at the first place. There are other choices like packfile
> index, hash tables, but bitmap is the simplest and most efficient.

Agree, but Python set is going to be small and loading it is also fast.

> 
> > precursor revs in another set. So `ispublic()` and `isprecursor()`
> > checks are very fast, it's just a set lookup. Same with update - it's
> > just an insertion in the set.
> 
> If cache invalidation happens for common write operations and the cache
> needs to be rebuilt from scratch, then the solution is not better than
> stateful chg - the chg way is easier (no need to serialize data), and safer
> (not vulnerable to cache file corruption).
> 
> To make the bitmap cache really useful (solving problems that chg cannot not
> solve), we need to incrementally update the cache, and avoid rebuilding it
> from scratch, at least for common commands like "rebase", "amend", "commit".
> 
> > > Some code can be changed - "scmutil.filteredhash" seems to be one user
> > > that iterates "filteredrevs". But what it needs is only a hash - it
> > > could hash something else, like the mtime, size etc.
> > 
> > Bookmarks, changelog, obsstore and tags can affect filtered set. 
> > For filtered repo we'll need to use size + mtime of bookmarks,
> > changelog, obsstore, tags and maybe even smth else. That maybe
> > error-prone.
> 
> The "filteredhash" seems to be only used by tags and branchmap cache. I
> think we can actually get rid of it by only caching unfiltered versions and
> do the filtering later. It also seems wrong to have tags and filteredrevs
> mutually depend on each other.

If we can get rid of `filteredhash` then we can probably proceed with
bitmaps. Although using bitmaps will still require bigger code changes
than using serialized sets.

> 
> > > Bitmaps could also be smarter - like maintaining the min and max revisions
> > > so it does not need to be exactly len(repo).


More information about the Mercurial-devel mailing list