Development of a performance tracking tool for Mercurial
Philippe Pepiot
philippe.pepiot at logilab.fr
Thu May 26 05:57:10 EDT 2016
On 05/23/2016 11:38 PM, Kevin Bullock wrote:
> (resurrecting this thread now that I've had a closer look at the plan)
>
Thanks !
>> >On Apr 12, 2016, at 05:03, Philippe Pepiot<philippe.pepiot at logilab.fr> wrote:
>> >
>> >[...]
>> >Now I've a question about writing and maintaining benchmark code as we have multiple choices here:
>> >
>> >1) Use mercurial internal API (benefits: unlimited possibilities without modifying mercurial and we can write backward compatible benchmark with some 'if' statements and benchmarks older versions, profits all ASV features (profiling, memory benchmarks etc). Drawbacks: duplicate code with contrib/perf.py, will break on internal API changes, need more maintenance and more code to write/keep backward compatible benchmarks).
>> >
>> >2) Use contrib/perf.py extension from the benchmarked version of mercurial (benefits: de facto backward compatible, drawbacks: limited to what the tool can do in previous versions)
>> >
>> >3) Use contrib/perf.py extension from the latest version of mercurial (benefits: no duplicate code, easier maintenance, new benchmarks profits to both tools. Drawbacks: not backward compatible for now (it works only for >= 3.7 versions)). We could also implement some glue code, either in the tracking tool or in contrib/perf.py, to list available benchmarks and theirs parameters.
>> >
>> >At this stage of the project my advice it to use 1), but we could also have a mix of 1) and 3). It depend on how fast are internal api changes and on your short/mid/long term objectives on the level of integration of the tracking tool.
> My first instinct tells me we should use approach #2, and teach the benchmark suite what Mercurial release a given perf* command first appeared in. This can be automated by reading the output of `hg help -e perf`.
>
> #1 sounds like a particularly bad idea given the deliberately high level of churn in our internal APIs. If we try to maintain an external tool that also tries to remember the details of that churn, I'm pretty sure it will rot and fall out of use in short order.
>
> #3 has a similar (though lesser) disadvantage. I also don't see a strong need to run new perf checks against old versions, with the exception of perhaps the current stable release.
>
> All that said, please convince me I'm wrong.:)
>
> Regarding ASV: is there any way we can use it to instrument our code (including profiling and memory tracking) without having it call into the internals? Specifically, is there a way that we can either feed it metrics that we output from the perf extension, or integrate it into perf.py such that it gets optionally imported (similar to how you can optionally use ipdb via the --debugger flag, wired up in dispatch.py)?
Ok if the internal API used in perf.py often change we can exclude #1
(at least until the benchmark suite is included into mercurial repository).
Nevertheless #1 can still be useful if you want to bisect a old
regression (and have no relevant benchmark code on the bisect range),
this can be done by just transforming the perf.py benchmark into a
regular asv benchmark (quite easy step), handle potential backward
compatibility then run the bisect.
With #2 we cannot fix potential "bugs" in perf.py when benchmarking old
revisions. For instance if an internal change alter a benchmark that is
no more computing the same thing than above and it's fixed later on, you
will keep having wrong values even if you re-run the benchmarks (but
this is not a big issue).
About #3, currently perf.py is not backward compatible with < 3.7
versions but it seems related to the extension code (not benchmark
code). After embedding commands.formatteropts and
commands.debugrevlogopts and remove the norepo parameter from
perflrucache, I can run some benchmarks on 3.0 versions.
For now, #2 (and #3) can be achieved with ASV by writing a "track"
benchmark that run the given perf.py command in a subprocess and return
the value.
To get a closer integration (and enable profiling features of ASV) we
could write a module that declare benchmarks (setup code and body code)
and let both asv and perf.py use it. Actually asv benchmark themselves
could be a candidate because this is just declarative code (functions
and class with naming convention and "setup" attribute on the function),
there is no dependency on asv here.
Another idea could be to call perf.py commands programmatically in ASV
and eventually monkeypatch the gettimer() function to transform the
command into a regular asv "time benchmark".
As a first step, I think we can go for the simplest solution (ie. #2)
and keep in mind a future inclusion (or dedicated api) in mercurial.
--
Philippe Pepiot
https://www.logilab.fr
More information about the Mercurial-devel
mailing list