RFC: bitmap storage for precursors and phases

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Mar 7 07:11:14 EST 2017



On 02/14/2017 02:04 AM, Sean Farley wrote:
> Jun Wu <quark at fb.com> writes:
>
>> In general, I think this is a good direction. Some random thoughts:
>>
>>   - general purposed
>>
>>     I think the bitmap is not always a cache, so it should only have
>>     operations like set/unset/readfromdisk/writetodisk. Practically, I won't
>>     couple cache invalidation with the bitmap implementation.
>>
>>     In additional, I'll try to avoid using Python-only types in the
>>     interface. So once we decide to rewrite part of the implementation in
>>     native C, we won't have trouble.
>>
>>     See "revset" below for a possibility that bitmap is used as a non-set.
>>
>>   - revset
>>
>>     This is a possibility that probably won't happen any time soon.
>>
>>     The revset currently uses Python set for maintaining its state. For huge
>>     sets, Python sets may not be a good option. And various operations could
>>     benefit from an always-topologically-sorted set, which is the bitmap.
>>
>>   - mmap
>>
>>     My intuition is that bitmaps fit better with mmap which can reduce the
>>     reading loading cost. I think "vfs.mmapread" could be a thing, and
>>     various places can benefit from it - Gabor recently showed interest in
>>     loading revlog data by mmap, I had patches that uses mmap to read revlog
>>     index.
>>
>> In additional, not directly related to this series, I'm a big fan of
>> single direction data flow. But the current code base does not seem to do a
>> good job in this area. As we are adding more caching layers to the code
>> base, it'd be nice if we have some tiny framework maintaining the dependency
>> of all kinds of data, to be able to understand the data flow easily, and
>> just to be more confident about loading orders. I think people more
>> experienced on architecture may want to share some ideas here.
>
> I was thinking about a more high-level approach (please feel free to
> pick apart):
>
> r = repo.filtered("bitmap1")
> r2 = r.filtered("bitmap2")
>
> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> thought about a union nor the inverse).


This double filtering idea is interresting. maybe we could have the 
'repoview' API understant repo.filtered("foo+bar") as a combination of 
filtering of foo+bar. The smart part of repoview (eg: filter hierarchy 
for cache inheritance, cache key, etc) should be able to automatically 
compute what do to for a combinaison.

Exposing the bitmap at that level seems strange. I think it is better to 
have the internal implementation of the filtering rely on a bitmat than 
to have the repository/repoview API to expose bitmap directly.

Cheers,

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list