| Crates.io | scors |
| lib.rs | scors |
| version | 0.2.3 |
| created_at | 2025-03-03 02:59:38.161853+00 |
| updated_at | 2025-08-11 21:42:16.000087+00 |
| description | Scores for binary classifier evaluation |
| homepage | https://github.com/hanslovsky/scors |
| repository | https://github.com/hanslovsky/scors |
| max_upload_size | |
| id | 1575073 |
| size | 62,619 |
This package is a Rust re-implementation with Python bindings of some of the classification scores from scikit-learn (sklearn),
restricted to binary classification only. Scores generally have 3 input parameters for labels, predictions, and weights, with slightly different names in sklearn:
| sklearn | scors |
|---|---|
y_true |
labels |
y_score |
predictions |
sample_weight |
weights |
Functions in scors have an additional parameter order that can be
None to indicate unsorted data,Order.ASCENDING to indicate that the input data is sorted in ascending order wrt predictions, orOrder.DESCENDING to indicate that the input data is sorted in descending order wrt predictions.Other parameters that may be present (e.g. max_fprs in roc_auc) follow the naming and meaning as defined in the respective sklearn counterpart
I want to improve runtime performance of scores for my use case. I have a single large background sample that I combine and score with each of many small foreground smaples.
For the rank-based metrics (e.g. average_precision-score_),
the data is sorted, which has complexity n*log(n).
Exploiting the structure of my data helps me avoid this cost to boost performance.
But even without assumptions about structure in the data, I found ways to improve performance.
This is a summary of all the optimizations I implemented (or plan to):
np.unique to check the validity of the data.
This can be helpful to ensure that assumptions are always met, especially in a library a huge audience and general scope like sklearn.
But it also has a performance penalty.
I decided, to place the responsibility for data validation completely on the caller.
The caller can add or leave out data validation as appropriate1 is created. Instead, the Rust implementation uses a constant value iterator.| sklearn | scors |
|---|---|
average_precision_score |
average_precision |
roc_auc_score |
roc_auc |
TODO: Benchmarks