RFC: bitmap storage for precursors and phases

Augie Fackler raf at durin42.com
Sun Feb 19 21:06:53 EST 2017


On Fri, Feb 17, 2017 at 09:59:48PM +0000, Stanislau Hlebik wrote:
> Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800:
> > On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <quark at fb.com> wrote:
> >
> > > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > > > As I said before we will load all non-public revs in one set and all
> > >
> > > The problem is, loading a Python set from disk is O(size-of-the-set).
> > >
> > > Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> > > we want bitmap at the first place. There are other choices like packfile
> > > index, hash tables, but bitmap is the simplest and most efficient.
> > >
> >
> > Hey folks,
> >
> > I haven't yet seen mention of some considerations that seem very important
> > in driving the decision-making, so I'd appreciate it if someone could fill
> > me in.
> >
> > Firstly, what's our current understanding of the sizes and compositions of
> > these sets of numbers? In theory, we have a lot of data from practical
> > application at Facebook, but nobody's brought it up.
>
> I assume that both sets (set for nonpublic commits and set for
> obsstore) are going to be very small comparing to the repo size. I
> expect both sets < 1% of the repo size. And the sets is going to be
> sparse.

I replied elsewhere in the thread, but in my clone of hg it's on the
order of 25-30% of the history, so assuming it's going to be very
sparse is probably unwise.


More information about the Mercurial-devel mailing list