[![Build Status](https://travis-ci.org/gz/autoperf.svg)](https://travis-ci.org/gz/autoperf)
[![Crates.io](https://img.shields.io/crates/v/autoperf.svg)](https://crates.io/crates/autoperf)
[![docs.rs/autoperf](https://docs.rs/autoperf/badge.svg)](https://docs.rs/crate/autoperf/)
# autoperf
autoperf simplifies the instrumentation of programs with performance
counters on Intel machines. Rather than trying to learn how to measure every
event and manually programming event values in counter registers or perf, you
can use autoperf which will repeatedly run your program until it has measured
every single performance event on your machine. autoperf tries to compute a
schedule that maximizes the amount of events measured per run, and
minimizes the total number of runs while avoiding multiplexing of events on
counters.
## Background
Performance monitoring units typically distinguish between performance events and counters.
Events refer to observations on the micro-architectural level
(e.g., a TLB miss, a page-walk etc.), whereas counters are hardware registers that
count the occurrence of events. The figure on the right shows the number of different
observable events for different Intel micro-architectures. Note that current systems
provide a very large choice of possible events to monitor. The number of measurable
counters per PMU is limited (typically from two to eight). For example, if the same
events are measured on all PMUs on a SkylakeX (Xeon Gold 5120) machine, we can only
observe a maximum of 48 different events (without sampling). autoperf simplifies the process
of fully measuring and recording every performance event for a given program.
In our screen session above, recorded on a SkylakeX machine with ~3500 distinct events,
we can see how autoperf automatically runs a program 1357 times while measuring and recording
a different set of events in every run.
# Installation
autoperf is known to work with Ubuntu 18.04 on Skylake and
IvyBridge/SandyBridge architectures. All Intel architectures should work,
please file a bug request if it doesn't. autoperf builds on `perf` from the
Linux project and a few other libraries that can be installed using:
```
$ sudo apt-get update
$ sudo apt-get install likwid cpuid hwloc numactl util-linux
```
To run the example analysis scripts, you'll need these python3 libraries:
```
$ pip3 install ascii_graph matplotlib pandas argparse numpy
```
You'll also need the *nightly version* of the rust compiler which is
best installed using rustup:
```
$ curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain nightly
$ source $HOME/.cargo/env
```
autoperf is published on crates.io, so once you have rust and cargo installed,
you can get it directly from there:
```
$ cargo +nightly install autoperf
```
Or alternatively, clone and build the repository yourself:
```
$ git clone https://github.com/gz/autoperf.git
$ cd autoperf
$ cargo build --release
$ ./target/release/autoperf --help
```
autoperf uses perf internally to interface with Linux and the performance
counter hardware. perf recommends that the following settings are disabled.
Therefore, autoperf will check the values of those configurations and refuse to
start if they are not set like below:
```
sudo sh -c 'echo 0 >> /proc/sys/kernel/kptr_restrict'
sudo sh -c 'echo 0 > /proc/sys/kernel/nmi_watchdog'
sudo sh -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
```
# Usage
autoperf has a few commands, use `--help` to get a better overview of all the
options.
## Profiling
The **profile** command instruments a single program by running it multiple times
until every performance event is measured. For example,
```
$ autoperf profile sleep 2
```
will repeatedly run `sleep 2` while measuring different performance events
with performance counters every time. Once completed, you will find an `out`
folder with many csv files that contain measurements from individual runs.
## Aggregating results
To combine all those runs into a single CSV result file you can use the
**aggregate** command:
```
$ autoperf aggregate ./out
```
This will do some sanity checking and produce a `results.csv`
([reduced example](../master/doc/results.csv)) file which contains
all the measured data.
## Analyze results
Performance events are measured individually on every core (and other
monitoring units). The `timeseries.py` can aggregate events by taking the
average, stddef, min, max etc. and producing a time-series matrix ([see a
reduced example](../master/doc/timeseries.csv)).
```
python3 analyze/profile/timeseries.py ./out
```
Now you have all the data, so you can start asking some questions. As an
example, the following script tells you how events were correlated
when your program was running:
```
$ python3 analyze/profile/correlation.py ./out
$ open out/correlation_heatmap.png
```
Event correlation for the `autoperf profile sleep 2` command
above looks like this (every dot represents the correlation of the timeseries
between two measured performance events, this is from a Skylake machine with
around 1700 non-zero event measurement):
![Correlation Heatmap](/doc/correlation_heatmap.png)
You can look at individual events too:
```
python3 analyze/profile/event_detail.py --resultdir ./out --features AVG.OFFCORE_RESPONSE.ALL_RFO.L3_MISS.REMOTE_HIT_FORWARD
```
![Plot events](/doc/perf_event_plot.png)
There are more scripts in the `analyze` folder to better work with the captured
data-sets. Have a look.
## What do I use this for?
autoperf allows you to quickly gather lots of performance (or training) data and
reason about it quantitatively. For example, we initially developed autoperf to
build ML classifiers that the Barrelfish scheduler could use for detecting
application slowdown and make better scheduling decisions. autoperf can gather
that data to generate such classifiers without requiring domain knowledge about
events, aside from how to measure them.
You can read more about our experiments here:
* https://dl.acm.org/citation.cfm?id=2967360.2967375
* https://www.research-collection.ethz.ch/handle/20.500.11850/155854
Last but not least, autoperf can potentially be useful in many other scenarios:
* Find out what performance events are relevant for your workload
* Analyzing and finding performance issues in your code or with different versions of your code
* Generate classifiers to detect hardware exploits (side channels/spectre/meltdown etc.)
* ...