ideas: chg repo preloading, and new changelog index

Sean Farley sean at farley.io
Fri Dec 30 15:47:15 EST 2016


Jun Wu <quark at fb.com> writes:

> chg repo preloading
>
>   I have been thinking about speeding up repo loading for a long time.
>   Previous ideas are persistent radix tree, hidden bitmap, mmap changelog.i.
>   
>   Recently I realized that chg (after the uisetup refactoring) could be an
>   option, assuming users use read commands more frequently than writes.
>   
>   The idea is simple, the master server (the process before fork) maintains
>   a map {repo_path: {index_hash: index, marker_hash: markers, ...}}, where
>   *_hash is a quick hash of sensitive properties like sensitive file sizes,
>   etc. to decide whether the value can be used. The forked worker gets the
>   map for free and uses it to quickly construct the repo object if the hash
>   matches.
>   
>   The master server needs a background thread doing the preloading. So it's
>   no longer stateless. Hopefully it's fine because all the preloading stuffs
>   are low-level, self-contained and not affected by extensions.
>   
>   However, if an extension does change the behavior of something being
>   cached here, we will have compatibility issues. It's solvable if chg has
>   APIs for 3rd-party extensions to just drop some kind of cache.
>   
>   3rd-party repo requirements can also be troublesome for things that
>   require a repo object to calculate, namely obsolete._compute*set. While
>   changelog.index, obsstore._readmarkers could be calculated without repo.
>   
>   Therefore I think it's still a good idea to cache those low-level stuffs
>   without a repo object.
>   
>   If this direction looks promising, I will try to start with caching the C
>   index object first. Then we can think about how to deal with the obsstore.
>
> new changelog index
>
>   (note: this is less related to chg, but fits nicely with the plan above)
>   
>   I personally like to see an efficient changelog "index" object whose code
>   is immutable to extensions (i.e. extensions could not change the logic
>   inside it), reusable outside the Python eco-system (likely implemented in
>   C without Python.h or Rust), taking a minimal set of inputs (changelog.i,
>   phaseroots, obsstore, but allows customized parsers), and deals with the
>   following independently (could be implemented incrementally):
>   
>     - converting between rev number, node (and partialmatch)
>     - calculate common ancestors
>     - revset bitmap representation: native ancestors / descendants
>       construction, support and/or/minus operations
>     - understand phases
>     - understand obsolete concepts
>   
>   If that looks promising, I'll try to work on it after the above chg change.

Now, this is something I really like (the preloading stuff looks good
too but wanted to chime in on this part first).

First of all, I completely agree with not including Python.h (nor Rust,
though I can bend on that if needed). Not every server (or client) wants
to spin up python (nor pypy) and I've been thinking of writing what
you've proposed for some time now. My dream is to have this small C
library used by core Mercurial so that there is still one canonical
implementation and other clients / servers can link to that for whatever
purpose.


More information about the Mercurial-devel mailing list