Development of a performance tracking tool for Mercurial

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Mar 29 18:21:12 EDT 2016



On 03/22/2016 01:54 AM, David Douard wrote:
> Hi everyone,
>
> we (Philippe and a bit of myself, at Logilab) are beginning to work on a
> performance tracking system for Mercurial.
>
> The need for such a tool has been expressed by Pierre-Yves who managed to
> get the project financed by fb.
>
> We've started (mostly Philippe did) by studying several solutions
> to start from.
>
> Below is a "quick" report on what we did for now.
>
> There is an html version of this document on
>
>    https://hg.logilab.org/review/hgperf/raw-file/tip/docs/tools.html

Thanks for posting this, I've inlined the document in my reply for 
easier comment.

> #####################################################
> Mercurial performance regression detection and report
> #####################################################
>
> Objectives
> ==========
>
> Mercurial code change fast and we must detect and prevent performances
> regressions as soon as possible.
>
> * Automatic execution of performance tests on a given Mercurial revision
> * Store the performance results in a database
> * Expose the performance results in a web application (with graphs, reports, dashboards etc.)
> * Provide some regression detection alarms with email notifications
>
> Metrics
> ~~~~~~~
>
> We already have code that produce performance metrics:
>
> * Commands from the perf extension in contrib/perf.py
> * Revset performance tests contrib/revsetbenchmarks.py
> * Unit test execution time
> * Annotated portions of unit test execution time

Note that we don't have official annotation (and logically no timing for 
them). But the phrasing seems fixed on the wiki page so I'm mostly 
talking to third party reader here.

> These metrics will be used (after some refactorings for some of the tools that
> produce them) as performance metrics, but we may need some more specifically
> written for the purpose of performance regression detection.
>
>
> Expected Results
> ~~~~~~~~~~~~~~~~
>
> Expected results are still to be discussed. For now, we aim at having a simple tool
> to track performance regressions on the 2 branches of the main mercurial repository
> (stable and default).
>
> However, there are some open questions for mid-term objectives:
>
> - What revisions of the Mercurial source code should we run the performance
> regression tool on? (public cs on the main branch only? Which branches? ...)

Let's focus on public changeset for now.

> - How do we manage the non-linear structure of a Mercurial history?

That's a fun question. The Mercurial repository is mostly linear as long 
as only one branch is concerned. However:

  - We don't (and have no reason to) enforce it,
  - the big picture with multiple branches part is still non-linear.

> - What kind of aggregations / comparisons de we want to be able to do? Should these
> be available through a "query language" or can they be hardwritten in the
> performance regression tool?

I think we can start with whatever is the simpler. But possible 
evolution in this area is probably one of the criteria for picking a tool.

> Existing tools
> ==============

It would be nice to boil that down to a list of criteria (eg: 
run-locally, handle-branch, regression algorithm, setup cost, storage 
format, etc…) And put all of them in a big table on the wiki page. That 
would help comparing them to each other and picking a winner.

>
> Airspeed velocity
> ~~~~~~~~~~~~~~~~~
>
> - http://asv.readthedocs.org/
> - used by the http://www.astropy.org/ project and inspired by https://github.com/pydata/vbench
> - Code: https://github.com/spacetelescope/asv
> - Presentation (2014): https://www.youtube.com/watch?v=OsxJ5O6h8s0
> - Python, Javascript (http://www.flotcharts.org/)
>
>
> This tool aims at benchmarking Python packages over their lifetime.
> It is mainly a command line tool, ``asv``, that run a series of benchmarks (described
> in JSON configuration file), and produces a static HTML/JS report.
>
> When running a benchmark suite, ``asv`` take care of clone/pulling the source repository
> in a virtual env and running the configured tasks in this virtual env.
>
> Results of each benchmark execution are stored in a "database" (consisting in
> JSON files). This database is used to produce evolution plots of the time required
> to run a test (or any metrics; out of the box, asv has support for 4 types of benchmark:
> timing, memory, peak memory and tracking), and to run the regression detection algorithms.
>
> One key feature of this tool is that it's very easy for every developer to use it on
> its own development environment. For example, it provides an ``asv compare`` command allowing to compare
> the results of any 2 revisions.
>
> However, asv will require some work to fit the needs:
>
> - The main drawback with asv is the fact it's designed with commit date as X axis.
> We must adapt the code of asv to properly handle this "non-linearity" related to
> dates (see https://github.com/spacetelescope/asv/issues/390)
> - Tags are displayed in the graphs as a secondary x axis labels, and are related to commit
> date of the tag; these should be displayed as annotations of the dots instead.
>
>
> :Pros:
>
> - Complete and cover most of our needs (and more)
> - Handle mercurial repositories
> - Generate static website with dashboard, interactive graphs
> - Detect regressions, implement step detection algorithms: http://asv.readthedocs.org/en/latest/dev.html#module-asv.step_detect
> - Parametrized benchmarks
> - Can collect metrics from multiple machines
> - Show tags on the graph, link to commits
> - Framework to write time, memory or custom benchmarks
> - Facilities to run benchmarks (run against a revset, compute only missing values etc)
> - Can be used easily on the developer side as well (before submitting patches)
> - Seems extensible easily through a plugin system
>
> :Cons:
>
> - No email notifications
> - Need to plot the graph by revision number instead of commit date
> - The graph per branch need to be fixed for mercurial

This one seems pretty solid and I like the idea of being able to run it 
locally.

The dashboard seems a bit too simple to me, and I'm a bit worried here. 
the branch part is another unknown.

How hard would be to implement a notification system on top of that.

> Example: https://hg.logilab.org/review/hgperf/raw-file/1e6b03b9407c/index.html (built with a patched ASV that workaround commit date and branch issues)
>
> Codespeed
> ~~~~~~~~~
>
>
> - Code: https://github.com/tobami/codespeed
> - Python (django), Javascript
> - Used by pypy and twisted, example http://speed.pypy.org/
> - Web application to store bench results and providing graphs and basic
>
>
> This tool is a python (django) web application that can retrieve
> benchmark results, store them in a SQL database and analyze them. It
> provide multiples views of the results (graphs, grids, report) and can
> generate a feed of notifications (regression or improvements) on the
> home page.
>
> A few things need to be setup to make it works. A project (VCS
> repository), an "executable" (particular compilation options of the
> project), an "environment" is the context in which tests are executed
> (CPU, OS), a "benchmark" has unit and can be cross-project or
> own-project. Then a "result" is a value of a benchmark running in a
> environment on an executable that is produced from a particular
> revision of the project.
>
> The trending computation is a comparison between the result and the
> average of the three previous results that produce a lot of false
> positive and the key feature of cross-project comparison seems useless
> to us.
>
>
> :Pros:
>
> - Nice UI with colors and trends (red, green)
> - Useful for comparative benchs (eg: pypy vs cpython)
> - Generate notifications automatically (global improvement/regression or per benchmark)
> - Integration with mercurial repository (show commits content, links etc)
>
> :Cons:
>
> - Poor regression detection algorithms (lot of false improvement/regression alarms)
> - No email notifications
> - Need a lot of setup

I've mixed filling about that, it has a pretty solid set of feature 
including some we'll really need soon (comparison of implementation). It 
also have some solid name behing it.

But It might be a bit over complicated and the regression tracking seems 
pretty bad compared to the other.

> Skia perf
> ~~~~~~~~~
>
> - https://perf.skia.org
> - Used to benchmark the skia graphic library
> - Nice UI https://perf.skia.org/alerts/
> - Code: https://github.com/google/skia-buildbot/tree/master/perf
> - Design: https://raw.githubusercontent.com/google/skia-buildbot/master/perf/DESIGN.md
> - Json format: https://raw.githubusercontent.com/google/skia-buildbot/master/perf/FORMAT.md
> - Go, Mysql, InfluxDB, Google Compute Engine, Javascript (https://www.polymer-project.org/, https://d3js.org/)
>
>
> Skia perf (https://perf.skia.org) is an interactive dashboard to display Skia (graphic library)
> performance data against multiple devices and GPUs. It provides a
> powerful interface to build custom graphs of performance data.
>
> The tool can detect regression using last square fitting method and produces a dashboards
> or regressions that can be annotated by connected users.
>
> It is written in go and javascript is based on git and rely on a complex stack including
> Google Compute Engine so it cannot be used as is without a huge adaptation.
>
>
> :Pros:
>
> - Detect regression using least square fitting method
> - Interface to set the status of a detected regression (ignore, bug)
> - Link to the commit which introduced the regression
> - Interface to build a custom graph from multiple metrics
> - Handle notifications
>
> :Cons:
>
> - Slow interface (eat browser memory)
> - Complex stack
> - Requires GCE
> - Not usable as it
> - Doesn't handle mercurial repositories

That one seems to have a pretty neat set of advance feature. But it 
seems like the amount of work to adapt it to our need is too large for 
it to be considered seriously.

>
> AreWeFastYet
> ~~~~~~~~~~~~
>
> - http://arewefastyet.com/
> - tracking performance of JavaScript engines
> - Code: https://github.com/h4writer/arewefastyet
> - Python, Php, MySQL, Javascript (http://www.flotcharts.org/).
> - All in one application
>
> AreWeFastYet (http://arewefastyet.com/) is a tool that checkout code of popular
> javascript engines and run some benchmark suites against them (octane, kraken, sunspider...),
> store the results in a mysql database and expose them in a web application that display
> comparative graphs and regression reports.
>
> AWFY is written in python, php and javascript and require a mysql database. The
> regression detection algorithm is based on a local average comparison and
> many things (builder machines etc) are hardcoded, it's specific to its purpose.
>
>
>
> :Pros:
>
> - Handle mercurial repositories
> - Show the regression commit range http://arewefastyet.com/regressions/#/regression/1796301
>
> :Cons:
>
> - Not usable as it
> - Unclear and custom regression detection algorithm that might not work in our cases

This does not seems like the droids we are looking for (too specialised)

>
>
> EzBench
> ~~~~~~~
>
> - Code: https://cgit.freedesktop.org/ezbench
> - Used to benchmark graphics related patch on the Linux kernel.
> - Slides: https://fosdem.org/2016/schedule/event/ezbench/attachments/slides/1168/export/events/attachments/ezbench/slides/1168/fosdem16_martin_peres_ezbench.pdf
> - Shell scripts
>
> EzBench (https://cgit.freedesktop.org/ezbench) is a collection of tools to benchmark
> graphics-related patchsets on the Linux kernel. It runs the benchmark suite on a particular
> commit and store the results as csv files. It has tools to read the results and generate static
> html reports. It can also automate the bisect process to find the commit who introduced the
> regression. It's written in shell and python and is highly coupled to its purpose.
>
> :Pros:
>
> - Generate reports
> - Bisects performance changes automatically and confirm a detected regression by reproducing it
> - Reducing variance tips, capture all benchmark data (hardware, libraries, versions)
>
> :Cons:
>
> - Not usable as it
> - Doesn't handle mercurial repositories

It is unclear to me what's make it not usable as is (beside  the lack of 
Mercurial support?)

> TimeSeries oriented Databases
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The reporting interface will provide reports (of course), graphs and dashboards so
> it's tempting to use tools like Grafana_, InfluxDB_, Graphite_ that already provide such
> features. Some of them (like Beacon_) can even provide notifications based on
> rules.
>
> But they are all based on *time* series only, and that doesn't really fit our needs
> because our problem is not linear with respect to the dates, and it may become really
> tricky to use it to collect metrics on drafts or handle merge changesets properly.
>
> Choosing such a time-series oriented database would most probably prove to be a
> poor choice due to structural inhability to handle the model of a repository.

I did not think too much about it yet, but I think I agree here. We 
probably want our primary view to be indexed on "revision number" or 
something similar.

> CI Builders
> ===========

Is there any reasons to not go with buildbot here?

> Use an existing CI builder will remove the hard part of running performances
> tests on build machines and trigger build automatically on SCM changes.
>
>
> Buildbot
> ~~~~~~~~
>
> The Mercurial project already use buildbot http://buildbot.mercurial-scm.org/
>
> Buildbot seems very extensible and configurable through python
> code/configuration and have a read only http/json API.
>
>
> Jenkins
> ~~~~~~~
>
> https://jenkins-ci.org has a lot of features including, smart choice of slave
> build (e.g. one concurrent build per slave), parametrized build (e.g. revision
> hash), gather artifacts (e.g. performance results). All features are also
> available through a REST API.
>
> Jenkins (and plugins) are in java.
>
>
> .. _Grafana: http://grafana.org/
> .. _InfluxDB: https://influxdata.com/
> .. _Graphite: http://graphite.wikidot.com/
> .. _Beacon: https://github.com/klen/graphite-beacon


-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list