RFC: bitmap storage for precursors and phases
Sean Farley
sean at farley.io
Tue Feb 21 18:43:52 EST 2017
Jun Wu <quark at fb.com> writes:
> Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800:
>> On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <quark at fb.com> wrote:
>>
>> > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
>> > > As I said before we will load all non-public revs in one set and all
>> >
>> > The problem is, loading a Python set from disk is O(size-of-the-set).
>> >
>> > Bitmap's loading cost should be basically 0 (with mmap). I think that's why
>> > we want bitmap at the first place. There are other choices like packfile
>> > index, hash tables, but bitmap is the simplest and most efficient.
>> >
>>
>> Hey folks,
>>
>> I haven't yet seen mention of some considerations that seem very important
>> in driving the decision-making, so I'd appreciate it if someone could fill
>> me in.
>>
>> Firstly, what's our current understanding of the sizes and compositions of
>> these sets of numbers? In theory, we have a lot of data from practical
>> application at Facebook, but nobody's brought it up.
>>
>> To clarify why this matters, if an obsstore contains say 100 entries, then
>> a very naive implementation should be fine. If it's related to something
>> else, such as the size of, amount of activity in, or local age of a repo,
>> then maybe we have a different set of decisions to make.
>
> First, I agree that if N is small, O(N) or O(log(N)), or O(1), do not matter
> much in practice. I tend to always reason about future-proof solutions,
> which may or may not be a good habit - I'll adjust in the future.
>
> I think there are multiple topics being discussed:
>
> 1. How to solve the overhead loading hiddenrevs with minimal changes?
> 2. Why is the bitmap format interesting (future use-cases)?
>
> For 1,
>
> I once tried to make one bit of revlog flag as the source of truth of
> "hidden", which looks a nice solution. But it got rejected because revlog
> has to be append-only. Moving the flag bits out formed the bitmap idea
> naturally.
>
> Bitmap fits here but it does not mean we have to use it to solve this
> particular problem. For example, stateful chg will get us some easy wins,
> and see the problem analysis and proposed solution below.
>
> Note that we actually have a simple caching format for the "hiddenrevs".
> See repoview.py, computehidden.
>
> But what's wrong with "computehidden" is the choice of the cache key - it
> calls "hideablerevs", which takes 0.5 seconds in obsolete.getrevs(repo,
> 'obsolete'), which reads the giant obsstore (in hg-committed) and do some
> complex computation. The result is then hashed and used as a cache key of
> "computehidden".
>
> So, if we do not want to know the "obsolete()" revisions, but just want to
> know what's hidden. Changing the cache key would be a very straightforward
> and effective solution which does not require a new file format.
>
> That'll help commands like "hg st", "hg log -T foo", etc. But it won't
> help complex "log"s which do require the "obsolete()" revisions. To speed
> up the latter, we can probably just copy the caching infrastructure from
> repoview.py.
>
> For 2,
>
> len(filteredrevs) in my hg-committed is 2155. It's small. I tend to think
Another data point for my hg-committed:
len(filteredrevs) > 8k
len(obs markers) > 120k
More information about the Mercurial-devel
mailing list