| Crates.io | simdly |
| lib.rs | simdly |
| version | 0.1.10 |
| created_at | 2025-06-10 09:21:19.31705+00 |
| updated_at | 2025-08-18 20:41:08.459172+00 |
| description | 🚀 High-performance Rust library leveraging SIMD and Rayon for fast computations. |
| homepage | |
| repository | https://github.com/mtantaoui/simdly |
| max_upload_size | |
| id | 1706888 |
| size | 1,437,389 |
⚠️ Development Status: This project is currently under active development. APIs may change and features are still being implemented.
🚀 A high-performance Rust library that leverages SIMD (Single Instruction, Multiple Data) instructions for fast vectorized computations. This library provides efficient implementations of mathematical operations using modern CPU features.
Add simdly to your Cargo.toml:
[dependencies]
simdly = "0.1.10"
For optimal performance, enable AVX2 support in your build configuration.
The library provides multiple algorithms for vector operations that you can choose based on your data size:
The library supports working with SIMD vectors directly, handling partial data efficiently, and provides mathematical operations with automatic SIMD acceleration including trigonometric functions, exponentials, square roots, powers, and distance calculations.
simdly provides significant performance improvements for numerical computations with multiple algorithm options:
Complex mathematical operations benefit from SIMD across all sizes:
| Function | Array Size | SIMD Speedup | Notes |
|---|---|---|---|
cos() |
4 KiB | 4.4x | Immediate benefit |
cos() |
64 KiB | 11.7x | Peak efficiency |
cos() |
1 MiB | 13.3x | Best performance |
cos() |
128 MiB | 9.2x | Memory-bound |
For maximum performance, compile with target-feature flags for AVX2 support, and consider using link-time optimization (LTO) and single codegen unit configuration in your release profile.
The library provides multiple algorithms that you can choose based on your specific needs, with fine-grained control over algorithm selection. It supports vectorized mathematical operations with automatic SIMD acceleration, efficient processing of large arrays with chunking strategies, and memory-aligned operations for optimal performance on both AVX2 and NEON architectures.
Clone the repository and build with cargo build --release.
Run tests with cargo test.
The crate includes comprehensive benchmarks showing real-world performance improvements:
Run benchmarks with cargo bench and view detailed reports in the target/criterion/report/ directory.
Key Findings from Benchmarks:
cos, sin, exp, etc.) show significant SIMD accelerationPARALLEL_SIMD_THRESHOLDContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ and ⚡ by Mahdi Tantaoui