[PATCH 09 of 10 lazy-changelog-parse] changelog: avoid slicing raw data until needed

Wed Mar 9 08:41:42 EST 2016

On Sun, 6 Mar 2016 16:13:07 -0800, Gregory Szorc wrote:
> You'll note from the revset numbers in this series that revset execution
> time increases proportionally to the number of revset functions that access
> changelog revision data. e.g. "author(x) or author(y)" is roughly 2x slower
> than "author(x)." We're apparently decoding changelog revisions for each
> revset function that accesses the data.
> 
> There is definitely a revset optimization to avoid the redundant changelog
> reading. I'm no expert on the revset code and am not sure what the most
> appropriate layer to implement such an optimization would be. Unlike
> template keywords, we don't have a cache passed to the revset functions. We
> /could/ implement an aggressive changectx cache as a context manager and
> have it active during revset execution. We could potentially also work
> something into the revset layer itself. I'm capable of implementing the
> former pretty easily. But I'm not sure it is the best layer. I'd appreciate
> if someone who understood the revset code better could weigh in on whether
> it is an appropriate location for more optimal behavior here.

Perhaps we could make optimize() fold a sequence of ctx-reading functions
into one. For example:

  "author(x) or desc(y)" -> "_matchctx('author:x', 'desc:y')"

It will also eliminate the deduplication process currently done by the addset.

A drawback of this approach is that it can't handle complex cases, e.g.
"author(x) or X or desc(y)".