RFC: bitmap storage for hidden

Durham Goode durham at fb.com
Mon Sep 5 20:20:08 EDT 2016


I agree that an optimized bitmap can come after the initial 
implementation.  I'm more concerned with getting this rigged into 
mercurial correctly and with appropriate cache mechanics than I am about 
shrinking the 100k.


On 9/2/16 2:05 PM, Jun Wu wrote:
> Given the fact that the plain bitmap is just 1/512 the size of changelog
> revlog index (which is small) and we don't compress the latter, I won't
> worry too much about the file size at this time. If we use the plain bitmap,
> we can mmap it so the memory usage is easily optimized.
>
> What we need here are a) "testing a bit" and b) "setting a bit". With a
> plain bitmap, they are both O(1), ignoring file system fragmentation.
>
> EWAH seems to bring a linear time complexity like O(size of the compressed
> bitmap) for both operations, which is probably not what we want.
>
> The roaringbitmap uses "sorted array" in 2 places, which makes insertion
> (setting a bit from 0 to 1) *much* harder imo. In the worst case, you have
> to "memmove" large part of the file.
>
> If we decide to optimize file size, I'd strongly like both operations (but
> especially the read) to remain O(1). Maybe use a fixed-sized first-level
> index (like git packfile fanout table) for the first 8 bits, and have a
> fixed-sized second-level index for the next 8 bits. And then have a plain
> bitmap for the next 16 bits. Avoid sorted arrays, they make insertion
> painful.
>
> Excerpts from Bryan O'Sullivan's message of 2016-09-02 09:00:26 -0700:
>> I agree that it would make a lot of sense to not use a plain bitmap. The
>> current state of the art for compressed high-performance bitmaps is here:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__roaringbitmap.org_&d=DQIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=5shTFMQ7fvpISnJIrsSb2qnpiwW03Zd8gW2Ke-udef8&s=AywMSSY4J06X2Sd5CJc0LKswrdPngNYU57LWLYSspkc&e=
>>
>> On Fri, Sep 2, 2016 at 8:09 AM, Augie Fackler <raf at durin42.com> wrote:
>>
>>> I've considered doing a bitmap index for hidden revs, but lacked the round
>>> tuits. Have you looked at any of the more interesting compressed bitmap
>>> storage schemes like EWAH? I know git uses that for some stuff now and it's
>>> been very efficient both speed and space wise.
>>>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=5shTFMQ7fvpISnJIrsSb2qnpiwW03Zd8gW2Ke-udef8&s=zzzWKkfQeX7y17dc5vJSXKCPaQD5kMMSKlF3chAhuYc&e=



More information about the Mercurial-devel mailing list