| Crates.io | oxiblas-core |
| lib.rs | oxiblas-core |
| version | 0.1.2 |
| created_at | 2025-12-27 22:17:14.270995+00 |
| updated_at | 2025-12-29 20:57:37.563646+00 |
| description | Core traits and SIMD abstractions for OxiBLAS |
| homepage | |
| repository | https://github.com/cool-japan/oxiblas |
| max_upload_size | |
| id | 2007915 |
| size | 406,737 |
Core traits, SIMD abstractions, and scalar types for the OxiBLAS library
oxiblas-core is the foundational crate for OxiBLAS, providing the core abstractions and building blocks used throughout the library. It is designed to be platform-agnostic with architecture-specific optimizations for x86_64 (AVX2/AVX-512) and AArch64 (NEON).
Scalar - Fundamental trait for numeric types supported by BLAS/LAPACK
f32, f64, Complex<f32>, Complex<f64>f16 (half precision) and f128 (quad precision)Architecture-specific vectorization with automatic fallback:
x86_64:
AArch64:
Fallback:
f16 (half precision) - 16-bit floating point (with f16 feature)
f128 (quad precision) - ~31 decimal digits precision (with f128 feature)
/sys/devices/system/cpu/)parallel feature)Add this to your Cargo.toml:
[dependencies]
oxiblas-core = "0.1"
# With extended precision
oxiblas-core = { version = "0.1", features = ["f16", "f128"] }
# With parallelization
oxiblas-core = { version = "0.1", features = ["parallel"] }
# All features
oxiblas-core = { version = "0.1", features = ["f16", "f128", "parallel"] }
use oxiblas_core::scalar::Scalar;
fn dot_product<T: Scalar>(x: &[T], y: &[T]) -> T {
x.iter()
.zip(y.iter())
.map(|(a, b)| *a * *b)
.fold(T::zero(), |acc, v| acc + v)
}
// Works with f32, f64, Complex<f32>, Complex<f64>
let x = vec![1.0f64, 2.0, 3.0];
let y = vec![4.0f64, 5.0, 6.0];
let result = dot_product(&x, &y); // 32.0
use oxiblas_core::simd::{SimdType, SimdOps};
// Automatic SIMD selection based on platform
let x: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0];
let y: Vec<f64> = vec![5.0, 6.0, 7.0, 8.0];
let mut result = vec![0.0; 4];
// Uses AVX2/NEON automatically if available
unsafe {
let simd = <f64 as SimdType>::simd();
simd.fma(&x, &y, &mut result);
// result = x * y + result
}
use oxiblas_core::scalar::QuadFloat;
#[cfg(feature = "f128")]
{
// Quad precision (f128) - ~31 decimal digits
let x = QuadFloat::from(2.0);
let sqrt_x = x.sqrt();
println!("√2 = {}", sqrt_x); // Very high precision
}
use oxiblas_core::scalar::kahan_sum;
let values: Vec<f64> = vec![1.0, 1e-16, -1.0]; // Difficult for naive sum
let result = kahan_sum(&values); // Accurate result using compensated summation
use oxiblas_core::tuning::detect_cache_sizes;
let cache = detect_cache_sizes();
println!("L1D: {} KB", cache.l1d / 1024);
println!("L2: {} KB", cache.l2 / 1024);
println!("L3: {} KB", cache.l3 / 1024);
use oxiblas_core::blocking::BlockParams;
// Get optimal blocking parameters for GEMM
let params = BlockParams::for_gemm::<f64>();
println!("MC={}, KC={}, NC={}", params.mc, params.kc, params.nc);
// Automatically tuned for your system's cache hierarchy
| Feature | Description | Default |
|---|---|---|
default |
Core functionality (f32, f64, complex) | ✓ |
parallel |
Rayon-based parallelization | |
f16 |
Half-precision (16-bit) floating point | |
f128 |
Quad-precision (~31 digits) via double-double | |
nightly |
Nightly-only optimizations | |
force-scalar |
Disable SIMD, use scalar only (debug) | |
max-simd-128 |
Limit to 128-bit SIMD (SSE/NEON) | |
max-simd-256 |
Limit to 256-bit SIMD (AVX2) |
| Platform | 128-bit | 256-bit | 512-bit |
|---|---|---|---|
| x86_64 (SSE4.1) | ✓ | ||
| x86_64 (AVX2) | ✓ | ✓ | |
| x86_64 (AVX-512) | ✓ | ✓ | ✓ |
| AArch64 (NEON) | ✓ | ||
| AArch64 (SVE) | ✓ | Planned | |
| Fallback (scalar) | ✓ |
| Operation | Size | Scalar | NEON (128-bit) | Speedup |
|---|---|---|---|---|
| f64 Add | 4,096 | 15.2 µs | 7.98 µs | 1.9× |
| f64 FMA | 4,096 | 22.1 µs | 11.29 µs | 2.0× |
| f32 Add | 4,096 | 8.1 µs | 3.2 µs | 2.5× |
| f32 FMA | 4,096 | 11.5 µs | 4.8 µs | 2.4× |
| Operation | Size | Scalar | AVX2 (256-bit) | Speedup |
|---|---|---|---|---|
| f64 Add | 4,096 | 18.4 µs | 7.98 µs | 2.3× |
| f64 FMA | 4,096 | 26.7 µs | 11.29 µs | 2.4× |
| f32 Add | 4,096 | 9.8 µs | 2.1 µs | 4.7× |
| f32 FMA | 4,096 | 14.2 µs | 3.2 µs | 4.4× |
oxiblas-core/
├── scalar.rs # Scalar trait, f16, f128, extended precision
├── simd.rs # SIMD abstraction layer
├── simd/
│ ├── avx2.rs # AVX2/FMA kernels (x86_64)
│ ├── avx512.rs # AVX-512 kernels (x86_64)
│ ├── neon.rs # NEON kernels (AArch64)
│ └── scalar.rs # Fallback scalar implementation
├── memory/
│ ├── align.rs # Aligned allocation
│ ├── workspace.rs # Temporary buffer management
│ └── cache.rs # Cache-aware utilities
├── blocking.rs # Blocking parameter calculation
├── tuning.rs # Platform detection and auto-tuning
└── parallel.rs # Parallel operations with Rayon
See the examples directory in the main repository:
basic_simd.rs - SIMD operationsextended_precision.rs - f16 and f128 usagecache_tuning.rs - Platform-specific optimizationRun benchmarks:
# SIMD benchmarks
cargo bench --package oxiblas-core --bench simd
# Blocking parameter benchmarks
cargo bench --package oxiblas-core --bench blocking
unsafe where requiredContributions are welcome! Areas of interest:
oxiblas-matrix - Matrix types built on oxiblas-coreoxiblas-blas - BLAS operations using oxiblas-coreoxiblas-lapack - LAPACK decompositionsoxiblas - Meta-crate with unified APILicensed under either of:
at your option.