RFC: bitmap storage for hidden

Jun Wu quark at fb.com
Fri Sep 2 17:05:58 EDT 2016


Given the fact that the plain bitmap is just 1/512 the size of changelog
revlog index (which is small) and we don't compress the latter, I won't
worry too much about the file size at this time. If we use the plain bitmap,
we can mmap it so the memory usage is easily optimized.

What we need here are a) "testing a bit" and b) "setting a bit". With a
plain bitmap, they are both O(1), ignoring file system fragmentation.

EWAH seems to bring a linear time complexity like O(size of the compressed
bitmap) for both operations, which is probably not what we want.

The roaringbitmap uses "sorted array" in 2 places, which makes insertion
(setting a bit from 0 to 1) *much* harder imo. In the worst case, you have
to "memmove" large part of the file. 

If we decide to optimize file size, I'd strongly like both operations (but
especially the read) to remain O(1). Maybe use a fixed-sized first-level
index (like git packfile fanout table) for the first 8 bits, and have a
fixed-sized second-level index for the next 8 bits. And then have a plain
bitmap for the next 16 bits. Avoid sorted arrays, they make insertion
painful.

Excerpts from Bryan O'Sullivan's message of 2016-09-02 09:00:26 -0700:
> I agree that it would make a lot of sense to not use a plain bitmap. The
> current state of the art for compressed high-performance bitmaps is here:
> http://roaringbitmap.org/ 
> 
> On Fri, Sep 2, 2016 at 8:09 AM, Augie Fackler <raf at durin42.com> wrote:
> 
> > I've considered doing a bitmap index for hidden revs, but lacked the round
> > tuits. Have you looked at any of the more interesting compressed bitmap
> > storage schemes like EWAH? I know git uses that for some stuff now and it's
> > been very efficient both speed and space wise.
> >


More information about the Mercurial-devel mailing list