RFC: bitmap storage for precursors and phases

Jun Wu quark at fb.com
Fri Feb 10 14:04:24 EST 2017


In general, I think this is a good direction. Some random thoughts:

  - general purposed

    I think the bitmap is not always a cache, so it should only have
    operations like set/unset/readfromdisk/writetodisk. Practically, I won't
    couple cache invalidation with the bitmap implementation.

    In additional, I'll try to avoid using Python-only types in the
    interface. So once we decide to rewrite part of the implementation in
    native C, we won't have trouble.

    See "revset" below for a possibility that bitmap is used as a non-set.

  - revset

    This is a possibility that probably won't happen any time soon.

    The revset currently uses Python set for maintaining its state. For huge
    sets, Python sets may not be a good option. And various operations could
    benefit from an always-topologically-sorted set, which is the bitmap.

  - mmap
    
    My intuition is that bitmaps fit better with mmap which can reduce the
    reading loading cost. I think "vfs.mmapread" could be a thing, and
    various places can benefit from it - Gabor recently showed interest in
    loading revlog data by mmap, I had patches that uses mmap to read revlog
    index.

In additional, not directly related to this series, I'm a big fan of
single direction data flow. But the current code base does not seem to do a
good job in this area. As we are adding more caching layers to the code
base, it'd be nice if we have some tiny framework maintaining the dependency
of all kinds of data, to be able to understand the data flow easily, and
just to be more confident about loading orders. I think people more
experienced on architecture may want to share some ideas here.

Excerpts from Stanislau Hlebik's message of 2017-02-10 17:33:28 +0000:
> Last year Durham sent a proposal for bitmap storage for computehidden() https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087860.html .
> 
> It got a few useful comments, two most important comments:
> 
>   1.  Use bitmaps for lower-level data structures, for example, bitmap for public commits and bitmap for commits that are affected by obsolescence markers
>   2.  Add cache validation checks
> 
> This is a new RFC that addresses these issues.
> 
>   1.  In the code, there is a cache only for precursors. Later I'm planning to add cache for public commits.
>   2.  Cache validation key was added. It can be any key. For precursorcache I've chosen obsstore file size and mtime and cache validation key. For phasecache we can use hash of sorted phaseroots file since this file is usually small.
> 
> 
> In a few places, I decided to delete cache entirely:
> 
>   1.  Caches are blown up if we receive obsmarkers via bundle2 or exchange._pullobsolete(). This is not necessary since cache won't be valid on the next run anyway because obsstore size and/or mtime was changed. We can change it.
>   2.  Obsstore can store markers for node that are not present in the repo. Since bitmap is basically a dict from rev to boolean it can't store markers for revisions that are not in the repo. It doesn't cause issues unless missing node is received from bundle or during pull. In this case precursorcache doesn't recognize it as obsoleted. To fix it cache is deleted during changegroup.apply() if we receive a node that is in obsstore.successors dict.
>   3.  Similar thing in bundlerepo - new obsoleted nodes may have been added, let's just not use precursorcache in bundlerepo.
> 
> 
> The code is here: https://bitbucket.org/stashlebik/hg/commits/branch/hiddenbitmaps 
> It's not super clean yet, but all tests pass (except maybe for commit/code checks).
> Please look and let me know what you think!
> 
> 
> Thanks,
> Stas


More information about the Mercurial-devel mailing list