light-svm

Crates.iolight-svm
lib.rslight-svm
version0.1.0
created_at2025-11-12 17:27:46.135902+00
updated_at2025-11-12 17:27:46.135902+00
descriptionLightweight, fast LinearSVC-style crate with Pegasos/DCD solvers, CSR input, OvR/OvO strategies, and optional Platt calibration.
homepage
repository
max_upload_size
id1929679
size10,740,518
Justin Sing (singjc)

documentation

README

light-svm

A lightweight, LinearSVC-style crate for Rust:

  • Linear SVM with hinge / squared-hinge
  • Two solvers: Pegasos (primal SGD) and DCD (LIBLINEAR-style dual coordinate descent + shrinking)
  • CSR sparse input
  • Multiclass strategies: Binary, OneVsRest, OneVsOne
  • Builder-style params with Solver::Auto heuristic
  • Optional Platt calibration stub for probabilities (Binary)

Quick start

use light_svm::{CsrMatrix, LinearSVC, ClassStrategy, SvmParams, Loss, Solver, PlattCalibrator, DecisionScores};

let x = CsrMatrix::from_dense(&vec![ vec![2.0, 1.0], vec![-1.0, -2.0] ], 0.0);
let y = vec![1, -1];

let params = SvmParams::builder()
    .c(1.0)
    .loss(Loss::Hinge)
    .fit_intercept(true)
    .tol(1e-3)
    .solver(Solver::Auto)
    .build();

let mut svc = LinearSVC::builder()
    .class_strategy(ClassStrategy::Binary)
    .params(params)
    .build();

svc.fit(&x, &y);

// Decision function
match svc.decision_function(&x) {
    DecisionScores::Binary { classes, scores } => {
        println!("classes={:?}, scores={:?}", classes, &scores[..]);
    }
    _ => unreachable!(),
}

// Calibrated probabilities (Binary)
let scores = match svc.decision_function(&x) { DecisionScores::Binary{scores, ..} => scores, _ => unreachable!() };
let pos = *y.iter().max().unwrap();
let y01: Vec<u8> = y.iter().map(|&yy| if yy==pos {1} else {0}).collect();
let calib = PlattCalibrator::fit(&scores, &y01);
svc.with_calibration(calib);
let proba = svc.predict_proba(&x); // Vec<[P(neg), P(pos)]>

Inline builder style

let mut svc2 = LinearSVC::builder()
    .class_strategy(ClassStrategy::OneVsRest)
    .c(1.0).loss(Loss::Hinge).fit_intercept(true)
    .tol(1e-3).solver(Solver::Auto)
    .build();

Decision function shapes

The decision_function returns a strategy-aligned enumeration:

  • DecisionScores::Binary { classes: [neg,pos], scores: Vec<f32> }
    • scores[i] is the raw margin w·x_i + b; positive => pos.
  • DecisionScores::OneVsRest { classes: Vec<i32>, scores: Vec<Vec<f32>> }
    • scores is shaped rows × classes, aligned to classes.
  • DecisionScores::OneVsOne { pairs: Vec<(i32,i32)>, scores: Vec<Vec<f32>> }
    • scores is rows × pairs; positive means vote for the first class in the pair.

predict_proba (Binary)

  • LinearSVC::predict_proba(&self, x) returns Vec<[P(neg), P(pos)]> for Binary models.
  • It uses a stored PlattCalibrator, attached via svc.with_calibration(calib).
  • Alternatively, call svc.predict_proba_with(x, &calib) without storing.

[!NOTE] Multiclass probability calibration (OvR/OvO) often uses one-vs-rest Platt or isotonic with normalization (e.g., pairwise coupling). The crate keeps a light stub for Binary; multiclass calibration can be added later.

Solver tolerance (tol): practical defaults

  • tol controls when DCD stops via the projected-gradient (PG) gap: stop when PGmax - PGmin ≤ tol.
  • Practical defaults:
    • 1e-2 for quick training / rough models.
    • 1e-3 (default) for balanced speed/accuracy.
    • 1e-4 for tighter convergence (slower).
  • Pegasos ignores tol; use max_epochs to trade accuracy vs time.

Per-class C vs class weights

  • Set per-class C directly: .c_by_class(neg, pos) or .c_neg(v), .c_pos(v).
  • If provided, these override class_weight_*.
  • If not provided: C_-1 = c * class_weight_neg, C_+1 = c * class_weight_pos.
  • DCD uses C_i exactly; Pegasos scales updates by (C_i / c).

Diagnostics (DCD)

  • .eval_every(k).verbose(true) prints metrics every k passes:
    • pgmax, pgmin, kkt = pgmax - pgmin
    • Primal / Dual objectives and duality gap
  • The binary LinearSvm summary carries kkt_history and gap_history.

Auto solver selection

  • Solver::Auto picks DCD for sparse, in-memory problems with ≤ 200k features, ≤ 2e8 nonzeros, and density ≤ 1%; otherwise Pegasos.

Extending the crate

  • Add a new optimizer by implementing fit_binary_* in solver.rs and routing via Solver.
  • Add kernel SVMs with a Kernel trait and a KernelSvm model; the LinearSVC API remains.
  • Add multiclass probability calibration (pairwise coupling / isotonic) as a follow-up.

Sandbox / integration datasets

The repo keeps shared CSV fixtures under tests/data/.
If you run the Python sandboxes or the Rust examples with --write-data, they will (re)generate the expected files in that directory, and the integration tests read from the same location. Make sure the tests/data folder exists before running the examples:

python sandbox/data_flair.py --write-data      # writes tests/data/train.csv + tests/data/test.csv
python sandbox/iris_multiclass.py --write-data # writes tests/data/iris_train.csv + iris_test.csv

Performance notes

Tip: Rayon-powered helpers are enabled by default (disable with --no-default-features if desired); you can still add RUSTFLAGS="-C target-feature=+avx2" to squeeze the most out of SIMD-heavy sections on capable CPUs.

Commit count: 0

cargo fmt