| Crates.io | axonml-fusion |
| lib.rs | axonml-fusion |
| version | 0.2.4 |
| created_at | 2026-01-24 14:04:40.159157+00 |
| updated_at | 2026-01-25 22:37:21.899727+00 |
| description | Kernel fusion optimization for the Axonml ML framework |
| homepage | |
| repository | https://github.com/automatanexus/axonml |
| max_upload_size | |
| id | 2066732 |
| size | 63,660 |
axonml-fusion provides kernel fusion support for combining multiple operations into single optimized kernels. By reducing memory bandwidth requirements and kernel launch overhead, fusion significantly improves performance for neural network inference and training workloads.
| Module | Description |
|---|---|
patterns |
Fusion pattern definitions and detection algorithms for MatMul, Conv, and Elementwise patterns |
elementwise |
Fused elementwise operations with builder pattern for chaining Add, Mul, ReLU, Sigmoid, etc. |
linear |
Fused linear layer operations combining MatMul + Bias + Activation |
optimizer |
Graph fusion optimizer with configurable passes and statistics tracking |
error |
Error types and Result alias for fusion operations |
Add this to your Cargo.toml:
[dependencies]
axonml-fusion = "0.1.0"
use axonml_fusion::{fuse_matmul_bias_relu, FusedLinear, Activation};
use axonml_tensor::Tensor;
// Create weight and bias tensors
let weight = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0], &[2, 2])?;
let bias = Tensor::from_vec(vec![0.5, 0.5], &[2])?;
// Create fused MatMul + Bias + ReLU operation
let fused = fuse_matmul_bias_relu(&weight, &bias)?;
// Execute fused operation
let input = Tensor::from_vec(vec![1.0, 1.0], &[2])?;
let output = fused.forward(&input)?;
use axonml_fusion::{FusedElementwise, fused_scale_bias_relu};
// Build a fused elementwise chain using the builder
let fused = FusedElementwise::builder()
.mul(2.0) // Scale by 2
.add(1.0) // Add bias
.relu() // Apply ReLU
.build();
let output = fused.forward(&input)?;
// Or use convenience functions
let scale_bias_relu = fused_scale_bias_relu(2.0, 1.0);
use axonml_fusion::{optimize_graph, FusionConfig, OpType};
// Define operation sequence
let ops = vec![
OpType::MatMul,
OpType::Add,
OpType::Relu,
OpType::Add,
OpType::Mul,
];
// Optimize with default configuration
let (patterns, stats) = optimize_graph(&ops, None)?;
println!("Fusions applied: {}", stats.fusions_applied);
println!("Operations eliminated: {}", stats.ops_eliminated);
println!("Estimated speedup: {:.2}x", stats.estimated_speedup);
use axonml_fusion::{FusionOptimizer, FusionConfig};
// Create conservative configuration
let config = FusionConfig::conservative();
// Or customize specific settings
let config = FusionConfig {
fuse_elementwise: true,
fuse_linear: true,
fuse_conv: false,
min_elementwise_chain: 3,
aggressive: false,
};
let mut optimizer = FusionOptimizer::with_config(config);
let patterns = optimizer.analyze(&ops);
| Pattern | Operations | Estimated Speedup |
|---|---|---|
| MatMul + Bias | MatMul, Add | 1.2x |
| MatMul + Bias + ReLU | MatMul, Add, ReLU | 1.3x |
| MatMul + Bias + GELU | MatMul, Add, GELU | 1.3x |
| Conv + BatchNorm | Conv, BatchNorm | 1.3x |
| Conv + BatchNorm + ReLU | Conv, BatchNorm, ReLU | 1.4x |
| Elementwise Chain | Multiple elementwise ops | 2.0x |
| Add + ReLU | Add, ReLU | 1.8x |
| Mul + Add (FMA) | Mul, Add | 1.5x |
The FusedElementwise builder supports:
add(f32) - Add constantmul(f32) - Multiply by constantrelu() - ReLU activationleaky_relu(f32) - Leaky ReLU with alphasigmoid() - Sigmoid activationtanh() - Hyperbolic tangentexp() - Exponentiallog() - Natural logarithmsqrt() - Square rootsquare() - Squareclamp(f32, f32) - Clamp to rangeneg() - Negationabs() - Absolute valueRun the test suite:
cargo test -p axonml-fusion
Licensed under either of:
at your option.