Linkrev perf

Jun Wu quark at fb.com
Tue Sep 27 17:52:42 EDT 2016


Excerpts from Gregory Szorc's message of 2016-09-27 10:04:43 -0700:
> As you noted, things get real ugly real quick. There are also implications
> for changegroup transfer, which does hardcode 20 bytes for the hash.

Hardcoded 20-byte is actually what I want - ideally old clients only read 20
bytes *everywhere* so if we increase the hash length, it would be a non-BC
for old clients. That said, I'm not encouraging that we go this way.

> I tend to think of linkrev as just another piece of revision-derived
> metadata that could be "cached" better. When you view it through that lens,
> you realize there are tons of other things we could also throw into a
> generic "derived from revision" cache. This could even include parts of
> changeset data, such as the author, branch, and files changed. Depending on
> how the cache is implemented (say you used SQLite), this could result in
> some significant performance wins for certain operations. e.g. an index of
> file changes in changesets would make `hg log --removed` significantly
> faster. If you moved the changelog to something that wasn't a revlog, it
> also makes shallow and narrow clone much easier to implement since
> out-of-order insertions would be allowed.

I'm +1 for things that do not introduce RNG noises, if possible. (I start
reconsidering the plan to move "amend_source" to "obsstore").

One of my earlier ideas that I haven't posted here is to build extra
"linknode-manifest"s. Every "linknode-manifest" is associated to a
changelog revision. The linknode-manifest is like a manifest, but stores
different things:

  fctx.path(): fctx.introrev()

First, it is a cache. Second, it could be seen as "revision-derived" data.
The nice things of this approach are:
  - No RNG noise
  - No BC
  - No cache-invalidation issue - once built, 100% correct permanently

Some people think disk usage is a concern. I think it's "acceptable", or if
we really want, we could merge this info with the normal manifest on disk.

If we cannot BC, I actually prefer this solution. This pattern also applies
to "how to make git scale".

> Thinking even bigger picture, there seem to be enough potential areas for
> improvement across many areas to - and I don't say this lightly - consider
> a new store format. Regardless of whether we decide to take that giant
> leap, I think it is important to keep a list of "known problems" and
> "things we would fix if we implemented things from scratch." Perhaps the
> "internals" help pages could start documenting these things?

I will update the wiki later.


More information about the Mercurial-devel mailing list