ngboost-rs

Crates.iongboost-rs
lib.rsngboost-rs
version1.1.1
created_at2026-01-15 23:39:31.012044+00
updated_at2026-01-19 10:07:11.942503+00
descriptionNatural Gradient Boosting for Probabilistic Prediction - A Rust implementation of NGBoost
homepage
repositoryhttps://github.com/agene0001/ngboost-rs
max_upload_size
id2047342
size648,242
PeppaPig (agene0001)

documentation

https://docs.rs/ngboost-rs

README

ngboost-rs

A Rust implementation of NGBoost (Natural Gradient Boosting for Probabilistic Prediction).

NGBoost is a modular boosting algorithm that allows you to obtain full probability distributions for your predictions, not just point estimates. This enables uncertainty quantification, prediction intervals, and probabilistic forecasting.

Features

  • Probabilistic Predictions: Get full probability distributions, not just point estimates

  • Multiple Distributions: Support for Normal, Poisson, Gamma, Exponential, Laplace, Weibull, and more

  • Classification Support: Bernoulli and multi-class Categorical distributions

  • Flexible Scoring Rules: LogScore and CRPScore implementations

  • Natural Gradient Boosting: Uses the natural gradient for efficient optimization on probability distribution manifolds

  • Generic Design: Easily extensible with custom distributions and base learners

    Installation

    ⚠️ Important: No BLAS backend is enabled by default.

    This library relies on a BLAS/LAPACK backend for matrix operations. To ensure cross-platform compatibility (e.g., macOS vs Windows), no backend is selected by default. You must explicitly enable one of the features below in your Cargo.toml, otherwise the project will fail to link.

    Add this to your Cargo.toml:

    [dependencies]
    # Replace "openblas" with "accelerate" (macOS) or "intel-mkl" (Windows/Linux) as needed
    ngboost-rs = { version = "0.1", features = ["openblas"] }
    ndarray = "0.15"
    

    Platform-Specific BLAS Backend

    This library requires a BLAS/LAPACK backend. Choose one based on your platform:

Add this to your Cargo.toml:

[dependencies]
ngboost-rs = { version = "0.1", features = ["openblas"] }
ndarray = "0.15"

Platform-Specific BLAS Backend

This library requires a BLAS/LAPACK backend. Choose one based on your platform:

macOS (Accelerate - recommended)

[dependencies]
ngboost-rs = { version = "0.1", features = ["accelerate"] }

Linux (OpenBLAS)

First install OpenBLAS:

# Ubuntu/Debian
sudo apt-get install libopenblas-dev

# Fedora
sudo dnf install openblas-devel

# Arch
sudo pacman -S openblas

Then in Cargo.toml:

[dependencies]
ngboost-rs = { version = "0.1", features = ["openblas"] }

Windows (OpenBLAS or Intel MKL)

For Windows, you can use Intel MKL or build OpenBLAS from source.

Using Intel MKL (easier on Windows):

[dependencies]
ngboost-rs = { version = "0.1", features = ["intel-mkl"] }

Intel MKL (any platform - high performance)

[dependencies]
ngboost-rs = { version = "0.1", features = ["intel-mkl"] }

Note: Intel MKL requires the MKL libraries to be installed.

Quick Start

Regression Example

use ndarray::{Array1, Array2};
use ngboost_rs::dist::Normal;
use ngboost_rs::learners::StumpLearner;
use ngboost_rs::ngboost::NGBoost;
use ngboost_rs::scores::LogScore;

fn main() {
    // Your training data
    let x_train: Array2<f64> = /* your features */;
    let y_train: Array1<f64> = /* your targets */;

    // Create and train the model
    let mut model: NGBoost<Normal, LogScore, StumpLearner> = 
        NGBoost::new(100, 0.1, StumpLearner);
    
    model.fit(&x_train, &y_train).expect("Failed to fit");

    // Make point predictions
    let predictions = model.predict(&x_test);

    // Get full predicted distributions (with uncertainty)
    let pred_dist = model.pred_dist(&x_test);
    println!("Predicted mean: {:?}", pred_dist.loc);
    println!("Predicted std:  {:?}", pred_dist.scale);
}

Classification Example

use ngboost_rs::dist::Bernoulli;
use ngboost_rs::dist::ClassificationDistn;

// Binary classification
let mut model: NGBoost<Bernoulli, LogScore, StumpLearner> = 
    NGBoost::new(50, 0.1, StumpLearner);

model.fit(&x_train, &y_train).expect("Failed to fit");

// Get class predictions
let predictions = model.predict(&x_test);

// Get class probabilities
let pred_dist = model.pred_dist(&x_test);
let probabilities = pred_dist.class_probs();  // Shape: (n_samples, n_classes)

Available Distributions

Regression Distributions

Distribution Parameters Use Case
Normal loc, scale General continuous data
NormalFixedVar loc When variance is known/fixed
NormalFixedMean scale When mean is known/fixed
LogNormal loc, scale Positive, right-skewed data
Exponential scale Waiting times, survival
Gamma shape, rate Positive continuous data
Poisson rate Count data
Laplace loc, scale Heavy-tailed data
Weibull shape, scale Survival analysis
HalfNormal scale Positive data near zero
StudentT loc, scale, df Heavy tails, robust
TFixedDf loc, scale T with fixed df=3
Cauchy loc, scale Very heavy tails

Classification Distributions

Distribution Parameters Use Case
Bernoulli 1 logit Binary classification
Categorical<K> K-1 logits K-class classification
Categorical3 2 logits 3-class classification
Categorical10 9 logits 10-class (e.g., digits)

Multivariate Distributions

Distribution Parameters Use Case
MultivariateNormal<P> P*(P+3)/2 Multi-output regression

Scoring Rules

NGBoost supports different scoring rules for training:

  • LogScore: Negative log-likelihood (default, most common)
  • CRPScore: Continuous Ranked Probability Score (proper scoring rule)
use ngboost_rs::scores::{LogScore, CRPScore};

// Using LogScore (default)
let model: NGBoost<Normal, LogScore, StumpLearner> = NGBoost::new(100, 0.1, StumpLearner);

// Using CRPScore
let model: NGBoost<Normal, CRPScore, StumpLearner> = NGBoost::new(100, 0.1, StumpLearner);

Uncertainty Quantification

One of the key advantages of NGBoost is uncertainty estimation:

let pred_dist = model.pred_dist(&x_test);

// For Normal distribution
for i in 0..n_samples {
    let mean = pred_dist.loc[i];
    let std = pred_dist.scale[i];
    
    // 95% confidence interval
    let ci_lower = mean - 1.96 * std;
    let ci_upper = mean + 1.96 * std;
    
    println!("Prediction: {:.2} [{:.2}, {:.2}]", mean, ci_lower, ci_upper);
}

Examples

Run the examples to see NGBoost in action:

# Basic regression
cargo run --example regression

# Binary classification
cargo run --example classification

# Comparing different distributions
cargo run --example distributions

# Uncertainty quantification
cargo run --example uncertainty

API Reference

NGBoost

impl<D, S, B> NGBoost<D, S, B>
where
    D: Distribution + Scorable<S> + Clone,
    S: Score,
    B: BaseLearner + Clone,
{
    /// Create a new NGBoost model
    /// - n_estimators: Number of boosting iterations
    /// - learning_rate: Step size for updates (typically 0.01-0.1)
    /// - base_learner: The base learner to use
    pub fn new(n_estimators: usize, learning_rate: f64, base_learner: B) -> Self;

    /// Fit the model to training data
    pub fn fit(&mut self, x: &Array2<f64>, y: &Array1<f64>) -> Result<(), &'static str>;

    /// Make point predictions
    pub fn predict(&self, x: &Array2<f64>) -> Array1<f64>;

    /// Get predicted probability distributions
    pub fn pred_dist(&self, x: &Array2<f64>) -> D;
}

Distribution Trait

pub trait Distribution: Sized + Clone + Debug {
    /// Create distribution from parameters
    fn from_params(params: &Array2<f64>) -> Self;
    
    /// Fit initial parameters from data
    fn fit(y: &Array1<f64>) -> Array1<f64>;
    
    /// Number of distribution parameters
    fn n_params(&self) -> usize;
    
    /// Point prediction (e.g., mean)
    fn predict(&self) -> Array1<f64>;
}

Performance Tips

  1. Learning Rate: Start with 0.1 and decrease if overfitting
  2. Number of Estimators: More is usually better, but watch for overfitting
  3. Distribution Choice: Match the distribution to your data characteristics
  4. Natural Gradient: Enabled by default, provides faster convergence
  5. Release Mode: Always use cargo build --release for production - it's significantly faster

Building for Performance

For best performance, always compile in release mode:

cargo build --release
cargo run --release --example regression --features accelerate

The release profile includes:

  • Full optimizations (opt-level = 3)
  • Link-time optimization (lto = "thin")

Debug builds are intentionally slower but compile faster during development.

Comparison with Python NGBoost

This Rust implementation aims to be compatible with the Python NGBoost library:

Feature Python Rust
Core Algorithm Yes Yes
Natural Gradient Yes Yes
LogScore Yes Yes
CRPScore Yes Yes (Normal, Laplace)
Regression Distributions 16 16
Classification Yes Yes
Survival/Censoring Yes Not yet
Scikit-learn Integration Yes N/A

Distribution Parity

All Python distributions have been ported:

  • Normal, NormalFixedVar, NormalFixedMean
  • LogNormal, Exponential, Gamma, Poisson
  • Laplace, Weibull, HalfNormal
  • StudentT, TFixedDf, TFixedDfFixedVar
  • Cauchy, CauchyFixedVar
  • Bernoulli, Categorical (k-class)
  • MultivariateNormal

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Acknowledgments

This is a Rust port of the excellent NGBoost Python library developed by the Stanford ML Group.

Commit count: 21

cargo fmt