Development of a performance tracking tool for Mercurial

Philippe Pepiot philippe.pepiot at
Thu Jun 30 03:41:25 EDT 2016

On 06/27/2016 06:53 PM, Augie Fackler wrote:
>> On Jun 27, 2016, at 5:24 AM, Philippe Pepiot 
>> <philippe.pepiot at <mailto:philippe.pepiot at>> wrote:
>> Hi,
>> I rewrote the benchmarks suite ( 
>> using contrib/ from tested version of mercurial while keeping 
>> old helper functions to write plain benchmarks easily.
>> Here is a summary of the current status of the project:
>> All my patches are merged upstream into AirSpeed Velocity, especially 
>> ordering commit by graph revision instead of date which was the big one.
>> I'm running a demo that benchmark 
>> all revisions added to the repository since one month, with sparse 
>> earlier commits too (since 3.4).
> Sweet!
>> One interesting result is that I'm benchmarking mercurial against the 
>> same mercurial repository, i.e. the reference repository is pulled, 
>> this explain all the small detected regressions. So benchmarking a 
>> moving repository should be avoided (seems trivial to say but I 
>> wasn't sure how this will be visible on the graphs)
> Yes, this makes sense. We should probably take a bit of time to define 
> specific repositories as of specific heads for benchmarking (I think 
> we should probably have at least one for each of small, medium, and 
> large?)

I cloned a new reference repository for hg (so it's not pulled anymore), 
but I was on tip revision and that caused a huge false positive 
regression on "perfstatus" benchmark, updated to the null revision and 
the false regression is now "fixed".

Yes the choice of reference repositories is important here.
One other thing that could be interesting to work on is to generate 
specific reference repositories states during benchmark (in a setup 
function) and then run specific contrib/ benchmark against them. 
For instance the "perfstatus" benchmark with a repository that have 
uncommitted changes and untracked files etc. That would represent some 
real scenarios and it's close to the original plan (except the "unit 
test" part). What do you think ?

Note that notification using an atom feed it's now merged:

>> ASV is actively maintained some new features are incoming:
>> Regressions 
>> notifications using atom feed, this one is part of the plan. If 
>> people prefer to receive notifications via email, there are tools 
>> like rss2email or online services to transform new feed items into an 
>> email.
>> This one (WIP) is 
>> quite interesting because it track improvement too and provide a 
>> useful summary page.
>> In the plan, 
>> , the 
>> next topic is using unit test execution time as benchmark result and 
>> related topics (Annotation system, handle changed tests, scenario 
>> based benchmarks). Most of them are shell style (.t) tests that spawn 
>> a lot of hg subprocesses and I wonder if it's relevant to use them as 
>> benchmark result because they work on very small repositories 
>> (generated during tests) and I think most of the time is spent on hg 
>> startup. Maybe we could detect startup time regression here, but I 
>> think we should miss others regressions that will be insignificant 
>> comparing to the whole test duration. I wonder if it's worth to put 
>> efforts on this topic, but I maybe missing something, do you have 
>> already used unit test execution time to compare performance changes 
>> introduced by a commit ?
>> Some other topics might require more work, like improving the web 
>> interface, for instance having a "per commit" view that summarize 
>> regressions and improvements introduced by a single commit, integrate 
>> the benchmark suite in mercurial, write more benchmarks, I could 
>> start by cover all combination of benchmarks/options offered in 
>> contrib/ Another topic could be extensions benchmarking. I 
>> put the development in pause while waiting your feedback, hoping (or 
>> not ;)) they will be some real regressions detected soon to validate 
>> the tool, I'll add the atom feed once it get merged in ASV.
>> Cheers,
>> On 05/26/2016 11:57 AM, Philippe Pepiot wrote:
>>> On 05/23/2016 11:38 PM, Kevin Bullock wrote:
>>>> (resurrecting this thread now that I've had a closer look at the plan)
>>> Thanks !
>>>>> >On Apr 12, 2016, at 05:03, Philippe 
>>>>> Pepiot<philippe.pepiot at> wrote:
>>>>> >
>>>>> >[...]
>>>>> >Now I've a question about writing and maintaining benchmark code 
>>>>> as we have multiple choices here:
>>>>> >
>>>>> >1) Use mercurial internal API (benefits: unlimited possibilities 
>>>>> without modifying mercurial and we can write backward compatible 
>>>>> benchmark with some 'if' statements and benchmarks older versions, 
>>>>> profits all ASV features (profiling, memory benchmarks etc). 
>>>>> Drawbacks: duplicate code with contrib/, will break on 
>>>>> internal API changes, need more maintenance and more code to 
>>>>> write/keep backward compatible benchmarks).
>>>>> >
>>>>> >2) Use contrib/ extension from the benchmarked version of 
>>>>> mercurial (benefits: de facto backward compatible, drawbacks: 
>>>>> limited to what the tool can do in previous versions)
>>>>> >
>>>>> >3) Use contrib/ extension from the latest version of 
>>>>> mercurial (benefits: no duplicate code, easier maintenance, new 
>>>>> benchmarks profits to both tools. Drawbacks: not backward 
>>>>> compatible for now (it works only for >= 3.7 versions)). We could 
>>>>> also implement some glue code, either in the tracking tool or in 
>>>>> contrib/, to list available benchmarks and theirs parameters.
>>>>> >
>>>>> >At this stage of the project my advice it to use 1), but we could 
>>>>> also have a mix of 1) and 3). It depend on how fast are internal 
>>>>> api changes and on your short/mid/long term objectives on the 
>>>>> level of integration of the tracking tool.
>>>> My first instinct tells me we should use approach #2, and teach the 
>>>> benchmark suite what Mercurial release a given perf* command first 
>>>> appeared in. This can be automated by reading the output of `hg 
>>>> help -e perf`.
>>>> #1 sounds like a particularly bad idea given the deliberately high 
>>>> level of churn in our internal APIs. If we try to maintain an 
>>>> external tool that also tries to remember the details of that 
>>>> churn, I'm pretty sure it will rot and fall out of use in short order.
>>>> #3 has a similar (though lesser) disadvantage. I also don't see a 
>>>> strong need to run new perf checks against old versions, with the 
>>>> exception of perhaps the current stable release.
>>>> All that said, please convince me I'm wrong.:)
>>>> Regarding ASV: is there any way we can use it to instrument our 
>>>> code (including profiling and memory tracking) without having it 
>>>> call into the internals? Specifically, is there a way that we can 
>>>> either feed it metrics that we output from the perf extension, or 
>>>> integrate it into such that it gets optionally imported 
>>>> (similar to how you can optionally use ipdb via the --debugger 
>>>> flag, wired up in
>>> Ok if the internal API used in often change we can exclude 
>>> #1 (at least until the benchmark suite is included into mercurial 
>>> repository).
>>> Nevertheless #1 can still be useful if you want to bisect a old 
>>> regression (and have no relevant benchmark code on the bisect 
>>> range), this can be done by just transforming the benchmark 
>>> into a regular asv benchmark (quite easy step), handle potential 
>>> backward compatibility then run the bisect.
>>> With #2 we cannot fix potential "bugs" in when benchmarking 
>>> old revisions. For instance if an internal change alter a benchmark 
>>> that is no more computing the same thing than above and it's fixed 
>>> later on, you will keep having wrong values even if you re-run the 
>>> benchmarks (but this is not a big issue).
>>> About #3, currently is not backward compatible with < 3.7 
>>> versions but it seems related to the extension code (not benchmark 
>>> code). After embedding commands.formatteropts and 
>>> commands.debugrevlogopts and remove the norepo parameter from 
>>> perflrucache, I can run some benchmarks on 3.0 versions.
>>> For now, #2 (and #3) can be achieved with ASV by writing a "track" 
>>> benchmark that run the given command in a subprocess and 
>>> return the value.
>>> To get a closer integration (and enable profiling features of ASV) 
>>> we could write a module that declare benchmarks (setup code and body 
>>> code) and let both asv and use it. Actually asv benchmark 
>>> themselves could be a candidate because this is just declarative 
>>> code (functions and class with naming convention and "setup" 
>>> attribute on the function), there is no dependency on asv here.
>>> Another idea could be to call commands programmatically in 
>>> ASV and eventually monkeypatch the gettimer() function to transform 
>>> the command into a regular asv "time benchmark".
>>> As a first step, I think we can go for the simplest solution (ie. 
>>> #2) and keep in mind a future inclusion (or dedicated api) in 
>>> mercurial.
>> -- 
>> Philippe Pepiot
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at 
>> <mailto:Mercurial-devel at>

Philippe Pepiot

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Mercurial-devel mailing list