RFC: bitmap storage for precursors and phases

Stanislau Hlebik stash at fb.com
Mon Feb 20 03:36:29 EST 2017


Excerpts from Augie Fackler's message of 2017-02-19 21:06:53 -0500:
> On Fri, Feb 17, 2017 at 09:59:48PM +0000, Stanislau Hlebik wrote:
> > Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800:
> > > On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <quark at fb.com> wrote:
> > >
> > > > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > > > > As I said before we will load all non-public revs in one set and all
> > > >
> > > > The problem is, loading a Python set from disk is O(size-of-the-set).
> > > >
> > > > Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> > > > we want bitmap at the first place. There are other choices like packfile
> > > > index, hash tables, but bitmap is the simplest and most efficient.
> > > >
> > >
> > > Hey folks,
> > >
> > > I haven't yet seen mention of some considerations that seem very important
> > > in driving the decision-making, so I'd appreciate it if someone could fill
> > > me in.
> > >
> > > Firstly, what's our current understanding of the sizes and compositions of
> > > these sets of numbers? In theory, we have a lot of data from practical
> > > application at Facebook, but nobody's brought it up.
> >
> > I assume that both sets (set for nonpublic commits and set for
> > obsstore) are going to be very small comparing to the repo size. I
> > expect both sets < 1% of the repo size. And the sets is going to be
> > sparse.
> 
> I replied elsewhere in the thread, but in my clone of hg it's on the
> order of 25-30% of the history, so assuming it's going to be very
> sparse is probably unwise.

In that case it's better to use bitmaps. But to do it we need to get rid
of filteredrevs iteration in scmutil.filteredhash() function.


More information about the Mercurial-devel mailing list