axonml-fusion

Crates.io	axonml-fusion
lib.rs	axonml-fusion
version	0.2.4
created_at	2026-01-24 14:04:40.159157+00
updated_at	2026-01-25 22:37:21.899727+00
description	Kernel fusion optimization for the Axonml ML framework
homepage
repository	https://github.com/automatanexus/axonml
max_upload_size
id	2066732
size	63,660

Andrew Jewell Sr. (AutomataControls)

documentation

README

axonml-fusion

Overview

axonml-fusion provides kernel fusion support for combining multiple operations into single optimized kernels. By reducing memory bandwidth requirements and kernel launch overhead, fusion significantly improves performance for neural network inference and training workloads.

Features

Pattern Detection: Automatic detection of fusible operation patterns in computational graphs
Linear Fusion: Fused MatMul + Bias + Activation operations (ReLU, GELU, Sigmoid, Tanh, SiLU)
Elementwise Fusion: Chain multiple elementwise operations into single memory-efficient kernels
Graph Optimizer: Configurable fusion optimizer with conservative and aggressive modes
Builder Pattern: Fluent API for constructing fused elementwise operation chains
Performance Statistics: Track fusions applied, operations eliminated, and estimated speedup
Parallel Execution: Leverages Rayon for parallel tensor operations

Modules

Module	Description
`patterns`	Fusion pattern definitions and detection algorithms for MatMul, Conv, and Elementwise patterns
`elementwise`	Fused elementwise operations with builder pattern for chaining Add, Mul, ReLU, Sigmoid, etc.
`linear`	Fused linear layer operations combining MatMul + Bias + Activation
`optimizer`	Graph fusion optimizer with configurable passes and statistics tracking
`error`	Error types and Result alias for fusion operations

Usage

Add this to your Cargo.toml:

[dependencies]
axonml-fusion = "0.1.0"

Fused Linear Operations

use axonml_fusion::{fuse_matmul_bias_relu, FusedLinear, Activation};
use axonml_tensor::Tensor;

// Create weight and bias tensors
let weight = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0], &[2, 2])?;
let bias = Tensor::from_vec(vec![0.5, 0.5], &[2])?;

// Create fused MatMul + Bias + ReLU operation
let fused = fuse_matmul_bias_relu(&weight, &bias)?;

// Execute fused operation
let input = Tensor::from_vec(vec![1.0, 1.0], &[2])?;
let output = fused.forward(&input)?;

Fused Elementwise Operations

use axonml_fusion::{FusedElementwise, fused_scale_bias_relu};

// Build a fused elementwise chain using the builder
let fused = FusedElementwise::builder()
    .mul(2.0)      // Scale by 2
    .add(1.0)      // Add bias
    .relu()        // Apply ReLU
    .build();

let output = fused.forward(&input)?;

// Or use convenience functions
let scale_bias_relu = fused_scale_bias_relu(2.0, 1.0);

Graph Optimization

use axonml_fusion::{optimize_graph, FusionConfig, OpType};

// Define operation sequence
let ops = vec![
    OpType::MatMul,
    OpType::Add,
    OpType::Relu,
    OpType::Add,
    OpType::Mul,
];

// Optimize with default configuration
let (patterns, stats) = optimize_graph(&ops, None)?;

println!("Fusions applied: {}", stats.fusions_applied);
println!("Operations eliminated: {}", stats.ops_eliminated);
println!("Estimated speedup: {:.2}x", stats.estimated_speedup);

Custom Fusion Configuration

use axonml_fusion::{FusionOptimizer, FusionConfig};

// Create conservative configuration
let config = FusionConfig::conservative();

// Or customize specific settings
let config = FusionConfig {
    fuse_elementwise: true,
    fuse_linear: true,
    fuse_conv: false,
    min_elementwise_chain: 3,
    aggressive: false,
};

let mut optimizer = FusionOptimizer::with_config(config);
let patterns = optimizer.analyze(&ops);

Supported Fusion Patterns

Pattern	Operations	Estimated Speedup
MatMul + Bias	MatMul, Add	1.2x
MatMul + Bias + ReLU	MatMul, Add, ReLU	1.3x
MatMul + Bias + GELU	MatMul, Add, GELU	1.3x
Conv + BatchNorm	Conv, BatchNorm	1.3x
Conv + BatchNorm + ReLU	Conv, BatchNorm, ReLU	1.4x
Elementwise Chain	Multiple elementwise ops	2.0x
Add + ReLU	Add, ReLU	1.8x
Mul + Add (FMA)	Mul, Add	1.5x

Elementwise Operations

The FusedElementwise builder supports:

add(f32) - Add constant
mul(f32) - Multiply by constant
relu() - ReLU activation
leaky_relu(f32) - Leaky ReLU with alpha
sigmoid() - Sigmoid activation
tanh() - Hyperbolic tangent
exp() - Exponential
log() - Natural logarithm
sqrt() - Square root
square() - Square
clamp(f32, f32) - Clamp to range
neg() - Negation
abs() - Absolute value

Tests

Run the test suite:

cargo test -p axonml-fusion

License

Licensed under either of:

MIT License
Apache License, Version 2.0

at your option.

Commit count: 0

axonml-fusion

documentation

README

axonml-fusion

Overview

Features

Modules

Usage

Fused Linear Operations

Fused Elementwise Operations

Graph Optimization

Custom Fusion Configuration

Supported Fusion Patterns

Elementwise Operations

Tests

License

cargo fmt