Crates.io | elinor |
lib.rs | elinor |
version | 0.4.0 |
source | src |
created_at | 2024-09-15 07:12:29.375427 |
updated_at | 2024-11-02 09:58:17.608399 |
description | Evaluation Library in Information Retrieval |
homepage | https://github.com/kampersanda/elinor |
repository | https://github.com/kampersanda/elinor |
max_upload_size | |
id | 1375298 |
size | 887,764 |
News: The CLI tools are now available in the elinor-cli directory!
Elinor is a Rust library for evaluating information retrieval (IR) systems. It provides a comprehensive set of metrics and statistical tests for evaluating and comparing IR systems.
Or, you can build and open the documentation locally by running the following command:
RUSTDOCFLAGS="--html-in-header katex.html" cargo doc --no-deps --features serde --open
elinor-cli provides command-line tools for evaluating and comparing IR systems.
For example, you can obtain various statistics from several statistical tests, as shown below:
# Means
+--------+----------+----------+
| Metric | System_1 | System_2 |
+--------+----------+----------+
| ndcg@5 | 0.3450 | 0.2700 |
+--------+----------+----------+
# Two-sided paired Student's t-test for (System_1 - System_2)
+--------+--------+--------+--------+--------+---------+---------+
| Metric | Mean | Var | ES | t-stat | p-value | 95% MOE |
+--------+--------+--------+--------+--------+---------+---------+
| ndcg@5 | 0.0750 | 0.0251 | 0.4731 | 2.1158 | 0.0478 | 0.0742 |
+--------+--------+--------+--------+--------+---------+---------+
# Two-sided paired Bootstrap test (n_resamples = 10000)
+--------+---------+
| Metric | p-value |
+--------+---------+
| ndcg@5 | 0.0511 |
+--------+---------+
# Fisher's randomized test (n_iters = 10000)
+--------+---------+
| Metric | p-value |
+--------+---------+
| ndcg@5 | 0.0498 |
+--------+---------+
# ndcg@5
## System means
+----------+--------+---------+
| System | Mean | 95% MOE |
+----------+--------+---------+
| System_1 | 0.3450 | 0.0670 |
| System_2 | 0.2700 | 0.0670 |
| System_3 | 0.2450 | 0.0670 |
+----------+--------+---------+
## Two-way ANOVA without replication
+-----------------+------------+----+----------+--------+---------+
| Factor | Variation | DF | Variance | F-stat | p-value |
+-----------------+------------+----+----------+--------+---------+
| Between-systems | 0.1083 | 2 | 0.0542 | 2.4749 | 0.0976 |
| Between-topics | 1.0293 | 19 | 0.0542 | 2.4754 | 0.0086 |
| Residual | 0.8317 | 38 | 0.0219 | | |
+-----------------+------------+----+----------+--------+---------+
## Effect sizes for Tukey HSD test
+----------+----------+----------+----------+
| ES | System_1 | System_2 | System_3 |
+----------+----------+----------+----------+
| System_1 | 0.0000 | 0.5070 | 0.6760 |
| System_2 | -0.5070 | 0.0000 | 0.1690 |
| System_3 | -0.6760 | -0.1690 | 0.0000 |
+----------+----------+----------+----------+
## p-values for randomized Tukey HSD test (n_iters = 10000)
+----------+----------+----------+----------+
| p-value | System_1 | System_2 | System_3 |
+----------+----------+----------+----------+
| System_1 | 1.0000 | 0.2561 | 0.1040 |
| System_2 | 0.2561 | 1.0000 | 0.8926 |
| System_3 | 0.1040 | 0.8926 | 1.0000 |
+----------+----------+----------+----------+
In addition to simple unit tests, Elinor's evaluation results are validated to ensure accuracy and reliability:
This library is inspired by Sakai's books on IR evaluation and statistical testing:
I recommend reading these books before using this library.
Licensed under either of
at your option.