light-svm

Crates.io	light-svm
lib.rs	light-svm
version	0.1.0
created_at	2025-11-12 17:27:46.135902+00
updated_at	2025-11-12 17:27:46.135902+00
description	Lightweight, fast LinearSVC-style crate with Pegasos/DCD solvers, CSR input, OvR/OvO strategies, and optional Platt calibration.
homepage
repository
max_upload_size
id	1929679
size	10,740,518

Justin Sing (singjc)

documentation

README

light-svm

A lightweight, LinearSVC-style crate for Rust:

Linear SVM with hinge / squared-hinge
Two solvers: Pegasos (primal SGD) and DCD (LIBLINEAR-style dual coordinate descent + shrinking)
CSR sparse input
Multiclass strategies: Binary, OneVsRest, OneVsOne
Builder-style params with Solver::Auto heuristic
Optional Platt calibration stub for probabilities (Binary)

Quick start

use light_svm::{CsrMatrix, LinearSVC, ClassStrategy, SvmParams, Loss, Solver, PlattCalibrator, DecisionScores};

let x = CsrMatrix::from_dense(&vec![ vec![2.0, 1.0], vec![-1.0, -2.0] ], 0.0);
let y = vec![1, -1];

let params = SvmParams::builder()
    .c(1.0)
    .loss(Loss::Hinge)
    .fit_intercept(true)
    .tol(1e-3)
    .solver(Solver::Auto)
    .build();

let mut svc = LinearSVC::builder()
    .class_strategy(ClassStrategy::Binary)
    .params(params)
    .build();

svc.fit(&x, &y);

// Decision function
match svc.decision_function(&x) {
    DecisionScores::Binary { classes, scores } => {
        println!("classes={:?}, scores={:?}", classes, &scores[..]);
    }
    _ => unreachable!(),
}

// Calibrated probabilities (Binary)
let scores = match svc.decision_function(&x) { DecisionScores::Binary{scores, ..} => scores, _ => unreachable!() };
let pos = *y.iter().max().unwrap();
let y01: Vec<u8> = y.iter().map(|&yy| if yy==pos {1} else {0}).collect();
let calib = PlattCalibrator::fit(&scores, &y01);
svc.with_calibration(calib);
let proba = svc.predict_proba(&x); // Vec<[P(neg), P(pos)]>

Inline builder style

let mut svc2 = LinearSVC::builder()
    .class_strategy(ClassStrategy::OneVsRest)
    .c(1.0).loss(Loss::Hinge).fit_intercept(true)
    .tol(1e-3).solver(Solver::Auto)
    .build();

Decision function shapes

The decision_function returns a strategy-aligned enumeration:

DecisionScores::Binary { classes: [neg,pos], scores: Vec<f32> }
- scores[i] is the raw margin w·x_i + b; positive => pos.
DecisionScores::OneVsRest { classes: Vec<i32>, scores: Vec<Vec<f32>> }
- scores is shaped rows × classes, aligned to classes.
DecisionScores::OneVsOne { pairs: Vec<(i32,i32)>, scores: Vec<Vec<f32>> }
- scores is rows × pairs; positive means vote for the first class in the pair.

`predict_proba` (Binary)

LinearSVC::predict_proba(&self, x) returns Vec<[P(neg), P(pos)]> for Binary models.
It uses a stored PlattCalibrator, attached via svc.with_calibration(calib).
Alternatively, call svc.predict_proba_with(x, &calib) without storing.

[!NOTE] Multiclass probability calibration (OvR/OvO) often uses one-vs-rest Platt or isotonic with normalization (e.g., pairwise coupling). The crate keeps a light stub for Binary; multiclass calibration can be added later.

Solver tolerance (`tol`): practical defaults

tol controls when DCD stops via the projected-gradient (PG) gap: stop when PGmax - PGmin ≤ tol.
Practical defaults:
- 1e-2 for quick training / rough models.
- 1e-3 (default) for balanced speed/accuracy.
- 1e-4 for tighter convergence (slower).
Pegasos ignores tol; use max_epochs to trade accuracy vs time.

Per-class C vs class weights

Set per-class C directly: .c_by_class(neg, pos) or .c_neg(v), .c_pos(v).
If provided, these override class_weight_*.
If not provided: C_-1 = c * class_weight_neg, C_+1 = c * class_weight_pos.
DCD uses C_i exactly; Pegasos scales updates by (C_i / c).

Diagnostics (DCD)

.eval_every(k).verbose(true) prints metrics every k passes:
- pgmax, pgmin, kkt = pgmax - pgmin
- Primal / Dual objectives and duality gap
The binary LinearSvm summary carries kkt_history and gap_history.

Auto solver selection

Solver::Auto picks DCD for sparse, in-memory problems with ≤ 200k features, ≤ 2e8 nonzeros, and density ≤ 1%; otherwise Pegasos.

Extending the crate

Add a new optimizer by implementing fit_binary_* in solver.rs and routing via Solver.
Add kernel SVMs with a Kernel trait and a KernelSvm model; the LinearSVC API remains.
Add multiclass probability calibration (pairwise coupling / isotonic) as a follow-up.

Sandbox / integration datasets

The repo keeps shared CSV fixtures under tests/data/.
If you run the Python sandboxes or the Rust examples with --write-data, they will (re)generate the expected files in that directory, and the integration tests read from the same location. Make sure the tests/data folder exists before running the examples:

python sandbox/data_flair.py --write-data      # writes tests/data/train.csv + tests/data/test.csv
python sandbox/iris_multiclass.py --write-data # writes tests/data/iris_train.csv + iris_test.csv

Performance notes

Tip: Rayon-powered helpers are enabled by default (disable with --no-default-features if desired); you can still add RUSTFLAGS="-C target-feature=+avx2" to squeeze the most out of SIMD-heavy sections on capable CPUs.

Commit count: 0

light-svm

documentation

README

light-svm

Quick start

Inline builder style

Decision function shapes

predict_proba (Binary)

Solver tolerance (tol): practical defaults

Per-class C vs class weights

Diagnostics (DCD)

Auto solver selection

Extending the crate

Sandbox / integration datasets

Performance notes

cargo fmt

`predict_proba` (Binary)

Solver tolerance (`tol`): practical defaults