| Crates.io | hodu_cpu_kernels |
| lib.rs | hodu_cpu_kernels |
| version | 0.2.4 |
| created_at | 2025-11-06 04:47:42.104225+00 |
| updated_at | 2025-11-13 22:00:47.6834+00 |
| description | hodu cpu kernels |
| homepage | |
| repository | https://github.com/hodu-rs/hodu |
| max_upload_size | |
| id | 1919080 |
| size | 925,446 |
High-performance CPU kernels for tensor operations in pure C, with optional SIMD acceleration, OpenBLAS integration, and multi-threading support.
Control build-time optimizations and dependencies:
HODU_DISABLE_NATIVE - Disable -march=native optimizationHODU_DISABLE_SIMD - Disable SIMD auto-detection and vectorizationHODU_DISABLE_THREADS - Disable pthread multi-threadingHODU_DISABLE_BLAS - Force disable OpenBLAS integrationOPENBLAS_DIR - OpenBLAS installation directoryOPENBLAS_INCLUDE_DIR - OpenBLAS headers directoryOPENBLAS_LIB_DIR - OpenBLAS library directory# Default build with all optimizations
cargo build --release
# Disable multi-threading for reproducible single-threaded performance
HODU_DISABLE_THREADS=1 cargo build --release
# Build without OpenBLAS for minimal dependencies
HODU_DISABLE_BLAS=1 cargo build --release
# Build for older CPUs without native optimizations
HODU_DISABLE_NATIVE=1 cargo build --release
# Cross-compile with custom OpenBLAS
OPENBLAS_DIR=/opt/openblas cargo build --target aarch64-unknown-linux-gnu --release
kernels/
├── atomic.h # Thread-safe atomic operations
├── constants.h # Math constants
├── math_utils.h # Math helper functions
├── simd_utils.h # SIMD abstractions (AVX2/SSE2/NEON)
├── thread_utils.h # Multi-threading utilities (pthread)
├── types.h # Data type definitions
├── utils.h # Tensor utilities
├── ops_binary.h/c # Binary operations
└── ops_concat_split.h/c # Concat/split operations
├── ops_conv.h/c # Convolution operations
├── ops_indexing.h/c # Indexing operations
├── ops_matrix.h/c # Matrix operations
├── ops_reduce.h/c # Reduction operations
├── ops_unary.h/c # Unary operations
├── ops_windowing.h/c # Windowing/pooling operations
no_std) and general-purpose platforms# Standard build
cargo build --release
# With OpenBLAS (auto-detected on Linux/macOS)
brew install openblas gfortran # macOS
# or
sudo apt install libopenblas-dev pkg-config gfortran # Linux
# Cross-compilation example (ARM with OpenBLAS)
OPENBLAS_DIR=/opt/arm-openblas cargo build --target aarch64-unknown-linux-gnu
# Run all tests
cargo test
# Test specific operation categories
cargo test --test ops_matrix
cargo test --test ops_unary
cargo test --test ops_binary
use hodu_cpu_kernels::*;
// Direct FFI call to C kernel
unsafe {
let input = vec![1.0f32; 100];
let mut output = vec![0.0f32; 100];
ops_unary::unary_relu_f32(
input.as_ptr() as *const _,
output.as_mut_ptr() as *mut _,
100,
0,
std::ptr::null(),
);
}
BSD-3-Clause