Linkrev perf
Jun Wu
quark at fb.com
Tue Sep 27 17:52:42 EDT 2016
Excerpts from Gregory Szorc's message of 2016-09-27 10:04:43 -0700:
> As you noted, things get real ugly real quick. There are also implications
> for changegroup transfer, which does hardcode 20 bytes for the hash.
Hardcoded 20-byte is actually what I want - ideally old clients only read 20
bytes *everywhere* so if we increase the hash length, it would be a non-BC
for old clients. That said, I'm not encouraging that we go this way.
> I tend to think of linkrev as just another piece of revision-derived
> metadata that could be "cached" better. When you view it through that lens,
> you realize there are tons of other things we could also throw into a
> generic "derived from revision" cache. This could even include parts of
> changeset data, such as the author, branch, and files changed. Depending on
> how the cache is implemented (say you used SQLite), this could result in
> some significant performance wins for certain operations. e.g. an index of
> file changes in changesets would make `hg log --removed` significantly
> faster. If you moved the changelog to something that wasn't a revlog, it
> also makes shallow and narrow clone much easier to implement since
> out-of-order insertions would be allowed.
I'm +1 for things that do not introduce RNG noises, if possible. (I start
reconsidering the plan to move "amend_source" to "obsstore").
One of my earlier ideas that I haven't posted here is to build extra
"linknode-manifest"s. Every "linknode-manifest" is associated to a
changelog revision. The linknode-manifest is like a manifest, but stores
different things:
fctx.path(): fctx.introrev()
First, it is a cache. Second, it could be seen as "revision-derived" data.
The nice things of this approach are:
- No RNG noise
- No BC
- No cache-invalidation issue - once built, 100% correct permanently
Some people think disk usage is a concern. I think it's "acceptable", or if
we really want, we could merge this info with the normal manifest on disk.
If we cannot BC, I actually prefer this solution. This pattern also applies
to "how to make git scale".
> Thinking even bigger picture, there seem to be enough potential areas for
> improvement across many areas to - and I don't say this lightly - consider
> a new store format. Regardless of whether we decide to take that giant
> leap, I think it is important to keep a list of "known problems" and
> "things we would fix if we implemented things from scratch." Perhaps the
> "internals" help pages could start documenting these things?
I will update the wiki later.
More information about the Mercurial-devel
mailing list