`iocost-tune` benchmark
=======================

`iocost-tune` analyzes the results of an `iocost-qos` benchmark to identify
behavior characteristics of the IO device and compute iocost QoS parameter
solutions. If the specified bench series doesn't include a preceding
`iocost-qos` instance, `iocost-tune` runs `iocost-qos` as follows:

```
   iocost-qos:dither,vrate-max=125.0,vrate-intvs=25
```

Analyzed Metrics
================

By default, `iocost-tune` analyzes how the following metrics change as vrate
is throttled:

#### MOF (Memory Offloading Factor)

How much of `rd-hashd` memory footprint can be offloaded to the IO device.
This is a latency-limited bandwidth performance metric. See the `common` doc
and `resctl-demo` for more information on memory offloading.

#### aMOF (Adjusted Memory Offloading Factor)

How much of `rd-hashd` memory footprint can be offloaded to the IO device
while being able to protect `rd-hashd` against interferences. This is always
equal to or less than `MOF` for the same vrate. For latency critical use
cases, this is the memory footprint that can be supported safely by the IO
device.

#### aMOF-delta (Adjusted Memory Offloading Factor Delta)

The difference between `MOF` and `aMOF-delta`. The wider the delta, the more
difficult it is to size the workload for protection as a size which
saturates the machine will be too big to protect.

#### isol-01 (Isolation Factor)

Isolation factor is defined as

```
   MEASURED_RPS / TARGET_RPS
```

and indicates the quality of protection. It's measured every second and one
of the percentiles (the 1st by default) is compared against the threshold
(90% by default) to determine whether protection is good enough.

This is what guides whether `aMOF` needs to be pushed lower. If the recorded
value for a given vrate is lower than the threshold, it indicates that
sufficient protection couldn't be achieved even at the smallest workload
size.

#### `lat-imp` (Latency Impact)

Latency impact is defined as

```
   (MEASURED_LATENCY - BASELINE_LATENCY) / BASELINE_LATENCY
```

where latency is the end-to-end `rd-hashd` request completion latency.

#### `work-csv` (Work Conservation)

Measures how much IO bandwidth the kernel was able to preserve while
protecting against memory hog. The lossage is caused by inefficiency in the
current implementation of anonymous memory throttling and doesn't reflect IO
device characteristics.

#### `rlat-XX-YY` and `wlat-XX-YY` (Read and Write Latencies)

IO read and write completion latencies. See `common` doc for more info.


Solutions
=========

The following iocost QoS solutions are computed by default. Note that the
descriptions of the solution logics aren't comprehensive.

#### `naive`

It targets 100% of what the model parameters describe (`fio` measured
maximum). vrate will be throttled down to 75% based on the p99 read and
write latencies.

#### `bandwidth`

This targets the maximum vrate at which `rd-hashd` can be isolated
sufficiently - isol-01 >= 90%. Sizing memory footprint may be challenging
with this solution - a workload sized for saturation may be too big for
isolation.

#### `isolated-bandwidth` 

This targets the maximum vrate at which `rd-hashd`'s isolatable memory
footprint is the biggest clamped between the `isolation` and `bandwidth`
solutions. This is the vrate at which the biggest memory footprint can be
isolated.

#### `isolation`

This targets the maximum vrate which renders the minimum aMOF-delta. Sizing
for isolation is the easiest with this solution - a workload sized for
saturation is as close to be isolable as possible on the device.

#### `rlat-99-q[1-4]`

Each of these solutions targets a quarter of the 99th percentile read
latency spread. `q1` targets 100% vrate and modulates it down to 75% point,
and then `q2` starts there and so on. These parameters can be useful for
trying out and seeing what would work if the other solutions aren't
available or adequate.


Reading Results
===============

When `format` subcommand is used to print the full result, graphs like the
followings are printed:

```
   $ resctl-bench -r result.json format iocst-tune
```

```
       |
       |
       |
       |
       |
   1.6-|                                                                       ●
       |                                       ●             ●
       |                                  ■■●■■■■■■■■■■●■■■■■■■■■■■●■■■■■■■■■■■
       |                                 ■        ● ●     ●     ●           ●
 M     |                                                                 ●
 O     |                                 ●                            ●
 F 1.4-|                              ● ■
 @     |            ●                  ■
 1     |                              ■
 6     |                             ■
       |
       |                            ■
       |          ●                ■
   1.2-|       ■■■■■■■■●■■■■■■■■■■■●
       |                  ●  ●  ●
       |
       |       ●
       |
       |
     1+--------------------------------------------------------------------------------
      |                      |                       |                      |
      0                     40                      80                     120
          vrate 14.7-124.7 (min=1.190 max=1.509 L-infl=48.4 R-infl=62.1 err=0.012)
```

The circles are the data points from `iocost-qos` results and the squares
form the fitted line, which is used to interpret the noisy source data. The
above is showing how MOF changes at different vrates. We can see that the
right inflection point is at the vrate 62.1%, which according to the above
description should be the `bandwidth` solution.

In the `Solutions` section, we can find the matching solution:

```
   [bandwidth] MOF=max
     info: scale=68.35% MOF=1.509@16 aMOF=1.287 aMOF-delta=0.118 isol-01=91.83%
     rlat: 50-mean= 221u 50-99= 469u 50-100= 947u 99-mean= 3.6m 99-99=12.3m 100-100= 294m
     wlat: 50-mean=34.5u 50-99= 189u 50-100= 781u 99-mean= 597u 99-99= 8.3m 100-100= 363m
     model: rbps=1454473514 rseqiops=156751 rrandiops=152357 wbps=678545224 wseqiops=145498 wrandiops=62214
     qos: rpct=0.00 rlat=3647 wpct=0.00 wlat=597 min=100.00 max=100.00
```

`scale=68.35` is showing how much the solution is throttling from the
original model parameters and should match the vrate from the MOF right
inflection point. However, the inflection point was 62.1% and our solution
is 68.35%. This is because the solution is applying some heuristics to avoid
sitting right on top of the steep slope based on the steepness of the slope
and variance.

The `model` and `qos` lines are the determined parameters that can be fed to
the kernel. For example, to apply to `nvme0n1` which has the device number
`259:0` and enable:

```
   $ echo '259:0 rbps=1454473514 rseqiops=156751 rrandiops=152357 wbps=678545224 wseqiops=145498 wrandiops=62214' > /sys/fs/cgroup/io.cost.model
   $ echo '259:0 enable=1 rpct=0.00 rlat=3647 wpct=0.00 wlat=597 min=100.00 max=100.00' > /sys/fs/cgroup/io.cost.qos
```

Note that the QoS `min` and `max` are fixed at 100% instead of 68.35%. This
is because the model parameters are scaled instead. `iocost-tune` always
scales the model parameters so that the QoS `max` always ends up 100%.

`iocost-tune` can also generate a pdf file containing all the results:

```
   $ resctl-bench -r result.json format iocost-tune:pdf
```


Merging
=======

`iocost-qos` benchmark result can be noisy form SSD behavior inconsistencies
and other system behavior variances. While `iocost-tune` tries its best to
make sense of the noisy data, nothing improves solution quality like more
data points.

While increasing the number of `iocost-qos` intervals is one way to obtain
more data points, the default 25 interval runs can already take more than
six hours. `iocost-tune` supports result merging so that data points from
multiple separate benchmark runs can be combined to yield more accurate
results.

For example, the following command merges the results in `result-0.json`,
`result-1.json` and `result-2.json` into `merged.json`.

```
   $ resctl-bench -r merged.json merge result-0.json result-1.json result-2.json
```

Note that the command isn't specifying the benchmark type to merge.
`resctl-bench` automatically merges all results which are mergeable, groups
them into source groups and merges them. The `iocost-tune` source results
are grouped by:

* Memory profile.
* Storage device model.
* Benchmark ID if `--by-id` is specified.
* `resctl-bench` version unless `--ignore-versions` is specified.
* `iocost-qos` bench properties except for `vrate-intvs`.

If `--multiple` is specified, all source groups are merged; otherwise, one
group with the most number of sources is selected and merged.

Merging records and reports what happened in `merge-info`, a pseudo
benchmark, result.

```
   [merge-info result] 2021-06-18 14:29:41 - 14:29:41

   [0] iocost-tune
     version: 1.0.0 x86_64-unknown-linux-gnu
     memory-profile: 16
     storage: WDC CL SN720 SDAQNTW-512G-1020
     classifier: dither,vrate-max=125
     sources:
       + result-0.json
       + result-1.json
       + result-2.json
```


Properties
==========

First group properties (applies to all sub-runs)
------------------------------------------------

#### `scale-min` (fraction, default: 0.01)

The minimum scale factor. No solution will scale below. See `scale-max`.

#### `scale-max` (fraction, default: 1.0)

The maximum scale factor. No solution will scale above. 1.0 means that the
solution won't ever scale up the model parameters.

#### Additional data set selector

Specify additional data sets to analyze:

* `isol-mean`: Average isolation factor
* `isol-PCT`: PCT'th percentile isolation factor
* `rlat-LAT_PCT-TIME_PCT`: IO read completion latencies. See `common` for
  details.
* `wlat-LAT_PCT-TIME_PCT`: IO write completion latencies. See `common` for
  details.


Second+ group properties
------------------------

Each group represents one QoS solution to compute. Every group should have
one `name` property and zero or one of the QoS solution target properties.
If no QoS solution target is specified, the `naive` solution is computed.

#### `name` (string)

The name of the solution.

#### `vrate` (vrate range), `rpct` (latency percentile), `wpct` (latency_percentile)

Manual vrate range with `rpct` and/or `wpct` based dynamic adjustment. For
example:

```
   $ resctl-bench -r merged.json solve 'iocost-tune::name=test,vrate=75-100,rpct=50,wpct=0'
```

produces a solution which is adjusted according to 50th percentile read
latency between 75% and 100%:

```
   [test] vrate=75-100, rpct=50
     info: scale=100.0% MOF=1.479@16 aMOF=1.269 aMOF-delta=0.221 isol-01=92.51%
     rlat: 50-mean= 225u 50-99= 713u 50-100= 1.9m 99-mean= 3.8m 99-99=13.1m 100-100= 346m
     wlat: 50-mean=54.9u 50-99= 305u 50-100=13.0m 99-mean= 1.4m 99-99=22.7m 100-100= 378m
     model: rbps=2127854279 rseqiops=229322 rrandiops=222894 wbps=992692782 wseqiops=212859 wrandiops=91017
     qos: rpct=50.00 rlat=225 wpct=0.00 wlat=0 min=75.00 max=100.00
```

#### `rlat-LAT_PCT` and `wlat-LAT_PCT` (fraction range or q[1-4])

vrate range which maps to the specified segment of the latency slope. For
example:

```
   $ resctl-bench -r merged.json solve 'iocost-tune::name=test,rlat-99=q2'
```

is equivalent to

```
   $ resctl-bench -r merged.json solve 'iocost-tune::name=test,rlat-99=50%-75%'

```

and produces

```
   [test] rlat-99=0.5-0.75
     info: scale=55.92% MOF=1.402@16 aMOF=1.269 aMOF-delta=0.087 isol-01=94.88%
     rlat: 50-mean= 198u 50-99= 431u 50-100= 744u 99-mean= 3.3m 99-99=13.1m 100-100= 253m
     wlat: 50-mean=54.9u 50-99= 305u 50-100= 2.6m 99-mean= 742u 99-99= 9.8m 100-100= 378m
     model: rbps=1189789720 rseqiops=128225 rrandiops=124631 wbps=555064169 wseqiops=119020 wrandiops=50892
     qos: rpct=99.00 rlat=3265 wpct=0.00 wlat=0 min=75.45 max=100.00
```

#### `mof=max` and `amof=max`

The minimum vrate point where the specified MOF is at maximum.

#### `isolated-bandwidth` and `isolation`

Solves for the `isolated bandwidth` and `isolation` solution described above
respectively.


Format properties
-----------------

#### `pdf` (String)

Generate a pdf file containing the result summary and graphs. If no value is
specified, `RESULT_PATH_STEM.pdf` is used where `RESULT_PATH_STEM` is the
file stem of the global `--result` path.

#### `high-level` (bool)

Asks for a very high level summary of the results. This is especially useful
hen deciding from many sets of merged results (see Merging above) which one
has more reliable parameters.