:orphan:
=========
Benchmark
=========
The following figure shows the prediction throughput of Treelite and XGBoost,
measured with various batch sizes.
.. plot:: _static/benchmark_plot.py
:nofigs:
.. raw:: html
(Get this plot in `SVG <_static/benchmark_plot.svg>`_,
`PNG <_static/benchmark_plot.png>`_,
`High-resolution PNG <_static/benchmark_plot.hires.png>`_)
**System configuration**. One AWS EC2 instance of type c5.18xlarge was used. It
consists of the following components:
* CPU: 72 virtual cores, 64-bit
* Memory: 144 GB
* Storage: Elastic Block Storage (EBS)
* Operating System: Ubuntu 14.04.5 LTS
**Datasets**. Three datasets were used.
* `Allstate Claim Prediction Challenge \
`_
* `HIGGS Data Set \
`_
* `Yahoo! Learning to Rank Challenge \
`_
**Methods**. For each datasets, we trained a 1600-tree ensemble using XGBoost.
Then we made predictions on batches of various sizes that were sampled randomly
from the training data. After running predictions using Treelite and XGBoost
(latter with :py:meth:`xgboost.Booster.predict`), we measured throughput as
the number of lines predicted per second.
Download the benchmark script: `benchmark.py <_static/benchmark.py>`_
`benchmark-xgb.py <_static/benchmark-xgb.py>`_
**Actual measurements**. You may download the exact measurements using the
following links:
* `allstate-treelite.csv <_static/allstate-treelite.csv>`_
* `allstate-xgb.csv <_static/allstate-xgb.csv>`_
* `higgs-treelite.csv <_static/higgs-treelite.csv>`_
* `higgs-xgb.csv <_static/higgs-xgb.csv>`_
* `yahoo-treelite.csv <_static/yahoo-treelite.csv>`_
* `yahoo-xgb.csv <_static/yahoo-xgb.csv>`_