ideas: chg repo preloading, and new changelog index

Jun Wu quark at
Tue Dec 27 11:35:07 EST 2016

chg repo preloading

  I have been thinking about speeding up repo loading for a long time.
  Previous ideas are persistent radix tree, hidden bitmap, mmap changelog.i.
  Recently I realized that chg (after the uisetup refactoring) could be an
  option, assuming users use read commands more frequently than writes.
  The idea is simple, the master server (the process before fork) maintains
  a map {repo_path: {index_hash: index, marker_hash: markers, ...}}, where
  *_hash is a quick hash of sensitive properties like sensitive file sizes,
  etc. to decide whether the value can be used. The forked worker gets the
  map for free and uses it to quickly construct the repo object if the hash
  The master server needs a background thread doing the preloading. So it's
  no longer stateless. Hopefully it's fine because all the preloading stuffs
  are low-level, self-contained and not affected by extensions.
  However, if an extension does change the behavior of something being
  cached here, we will have compatibility issues. It's solvable if chg has
  APIs for 3rd-party extensions to just drop some kind of cache.
  3rd-party repo requirements can also be troublesome for things that
  require a repo object to calculate, namely obsolete._compute*set. While
  changelog.index, obsstore._readmarkers could be calculated without repo.
  Therefore I think it's still a good idea to cache those low-level stuffs
  without a repo object.
  If this direction looks promising, I will try to start with caching the C
  index object first. Then we can think about how to deal with the obsstore.

new changelog index

  (note: this is less related to chg, but fits nicely with the plan above)
  I personally like to see an efficient changelog "index" object whose code
  is immutable to extensions (i.e. extensions could not change the logic
  inside it), reusable outside the Python eco-system (likely implemented in
  C without Python.h or Rust), taking a minimal set of inputs (changelog.i,
  phaseroots, obsstore, but allows customized parsers), and deals with the
  following independently (could be implemented incrementally):
    - converting between rev number, node (and partialmatch)
    - calculate common ancestors
    - revset bitmap representation: native ancestors / descendants
      construction, support and/or/minus operations
    - understand phases
    - understand obsolete concepts
  If that looks promising, I'll try to work on it after the above chg change.

More information about the Mercurial-devel mailing list