RFC: bitmap storage for precursors and phases
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Tue Mar 7 07:11:14 EST 2017
On 02/14/2017 02:04 AM, Sean Farley wrote:
> Jun Wu <quark at fb.com> writes:
>
>> In general, I think this is a good direction. Some random thoughts:
>>
>> - general purposed
>>
>> I think the bitmap is not always a cache, so it should only have
>> operations like set/unset/readfromdisk/writetodisk. Practically, I won't
>> couple cache invalidation with the bitmap implementation.
>>
>> In additional, I'll try to avoid using Python-only types in the
>> interface. So once we decide to rewrite part of the implementation in
>> native C, we won't have trouble.
>>
>> See "revset" below for a possibility that bitmap is used as a non-set.
>>
>> - revset
>>
>> This is a possibility that probably won't happen any time soon.
>>
>> The revset currently uses Python set for maintaining its state. For huge
>> sets, Python sets may not be a good option. And various operations could
>> benefit from an always-topologically-sorted set, which is the bitmap.
>>
>> - mmap
>>
>> My intuition is that bitmaps fit better with mmap which can reduce the
>> reading loading cost. I think "vfs.mmapread" could be a thing, and
>> various places can benefit from it - Gabor recently showed interest in
>> loading revlog data by mmap, I had patches that uses mmap to read revlog
>> index.
>>
>> In additional, not directly related to this series, I'm a big fan of
>> single direction data flow. But the current code base does not seem to do a
>> good job in this area. As we are adding more caching layers to the code
>> base, it'd be nice if we have some tiny framework maintaining the dependency
>> of all kinds of data, to be able to understand the data flow easily, and
>> just to be more confident about loading orders. I think people more
>> experienced on architecture may want to share some ideas here.
>
> I was thinking about a more high-level approach (please feel free to
> pick apart):
>
> r = repo.filtered("bitmap1")
> r2 = r.filtered("bitmap2")
>
> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> thought about a union nor the inverse).
This double filtering idea is interresting. maybe we could have the
'repoview' API understant repo.filtered("foo+bar") as a combination of
filtering of foo+bar. The smart part of repoview (eg: filter hierarchy
for cache inheritance, cache key, etc) should be able to automatically
compute what do to for a combinaison.
Exposing the bitmap at that level seems strange. I think it is better to
have the internal implementation of the filtering rely on a bitmat than
to have the repository/repoview API to expose bitmap directly.
Cheers,
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list