Next plans on radixlink and hash-preserving obsstore

Jun Wu quark at fb.com
Sun Jul 9 19:05:40 UTC 2017


I'd like to comment on next plans of radixlink and hash-preserving obsstore,
since they look promising and I didn't send followups promptly.

The previous radixlink code works, but comparing to the radixtree in
revlog.c, the radixlink series I have sent is 4x larger in space usage, and
70% slower on insertion. I want to improve these areas.

A key factor for revlog.c to be space efficient is it uses revision numbers
so it does not store nodes directly, and it does not need to store values
(it's a "set" instead of a "map"). I'd like to reuse the "int as key"
strategy. So a uint32 could represent a hash. That reduces space usage
significantly.

Using offsets of strings in obsstore is possible but requires some
non-trivial changes on top of the current obsstore. Since I do want to
change obsstore for hash-preserving, I'm not very interested in improving
the current obsstore for now.

Therefore the rough plan is to have a new obsstore format with indexing
(radixlink) in a native implementation. For "native", I'm also experimenting
with Rust. I have some Rust code for the revised radixlink and its
performance is on par with revlog.c.

The current plans are: 

  1. To reduce the review burden from the community about experimental
     stuff, and to iterate quickly, I'm going to prototype in fb-hgext
     first, utilizing Facebook's Rust review resources. Review discussions
     will be visible on phab.mercurial-scm.org so you're welcomed to join if
     you have interest.
  2. Once we have some confidence about file format (like in 3 weeks or so),
     we will start a discussion about final file format and APIs in the
     mailing list.  From there we can figure out the upstream approach, like
     having a Python fallback, choosing between Rust and C, having an
     upgrade path, etc.

Meanwhile, I'll continue contribute other changes.


More information about the Mercurial-devel mailing list