| Crates.io | tenrso-exec |
| lib.rs | tenrso-exec |
| version | 0.1.0-alpha.2 |
| created_at | 2025-11-08 09:05:57.771911+00 |
| updated_at | 2025-12-17 06:38:36.480383+00 |
| description | Unified execution API for TenRSo |
| homepage | https://github.com/cool-japan/tenrso |
| repository | https://github.com/cool-japan/tenrso |
| max_upload_size | |
| id | 1922676 |
| size | 580,944 |
Unified execution API for TenRSo tensor operations.
tenrso-exec provides the main user-facing API for executing tensor operations:
einsum_ex - Unified einsum contraction interfaceAll tensor operations (dense, sparse, low-rank) go through this unified interface.
[dependencies]
tenrso-exec = "0.1"
use tenrso_exec::{einsum_ex, ExecHints};
// Simple matrix multiplication
let C = einsum_ex::<f32>("ij,jk->ik")
.inputs(&[A, B])
.run()?;
// Tensor contraction with optimization hints
let result = einsum_ex::<f32>("bij,bjk->bik")
.inputs(&[A, B])
.hints(&ExecHints {
prefer_lowrank: true,
prefer_sparse: true,
tile_kb: Some(512),
..Default::default()
})
.run()?;
use tenrso_exec::{CpuExecutor, TenrsoExecutor, ElemOp, ReduceOp};
let mut exec = CpuExecutor::new();
// Element-wise operation
let abs_tensor = exec.elem_op(ElemOp::Abs, &tensor)?;
// Reduction
let sum = exec.reduce(ReduceOp::Sum, &tensor, &[0, 1])?;
tenrso-exec includes advanced optimization features that can be configured per executor:
use tenrso_exec::CpuExecutor;
// Default: all optimizations enabled
let mut exec = CpuExecutor::new();
// Custom configuration with selective optimizations
let mut exec = CpuExecutor::new()
.with_simd(true) // SIMD-accelerated operations
.with_tiled_reductions(true) // Cache-friendly blocked reductions
.with_vectorized_broadcast(true); // Optimized broadcasting patterns
// Disable all optimizations (for debugging or baseline comparison)
let mut exec = CpuExecutor::unoptimized();
SIMD Operations (enable_simd):
Tiled Reductions (enable_tiled_reductions):
Vectorized Broadcasting (enable_vectorized_broadcast):
Enable SIMD when:
Enable Tiled Reductions when:
Disable optimizations when:
Default configuration is optimal for most workloads:
let mut exec = CpuExecutor::new(); // All optimizations enabled
For debugging or numerical verification:
let mut exec = CpuExecutor::unoptimized();
For memory-constrained environments:
let mut exec = CpuExecutor::new()
.with_tiled_reductions(false); // Reduce memory footprint
For maximum throughput on modern CPUs:
let mut exec = CpuExecutor::new(); // All optimizations enabled by default
Run comprehensive benchmarks to measure optimization impact:
# Run all benchmarks
cargo bench
# Run optimization-specific benchmarks
cargo bench --bench optimization_benchmarks
# Compare optimized vs unoptimized performance
cargo bench --bench optimization_benchmarks -- simd
cargo bench --bench optimization_benchmarks -- tiled
Benchmark results include:
pub fn einsum_ex<T>(spec: &str) -> EinsumBuilder<T>
impl<T> EinsumBuilder<T> {
pub fn inputs(self, tensors: &[TensorHandle<T>]) -> Self;
pub fn hints(self, hints: &ExecHints) -> Self;
pub fn run(self) -> Result<TensorHandle<T>>;
}
pub struct ExecHints {
pub mask: Option<MaskPack>,
pub subset: Option<SubsetSpec>,
pub prefer_sparse: bool,
pub prefer_lowrank: bool,
pub tile_kb: Option<usize>,
}
pub trait TenrsoExecutor<T> {
fn einsum(&mut self, spec: &str, inputs: &[TensorHandle<T>], hints: &ExecHints)
-> Result<TensorHandle<T>>;
fn elem_op(&mut self, op: ElemOp, x: &TensorHandle<T>) -> Result<TensorHandle<T>>;
fn reduce(&mut self, op: ReduceOp, x: &TensorHandle<T>, axes: &[Axis])
-> Result<TensorHandle<T>>;
}
Apache-2.0