| Crates.io | avx-parallel |
| lib.rs | avx-parallel |
| version | 0.4.0 |
| created_at | 2025-12-17 03:28:33.801177+00 |
| updated_at | 2025-12-17 03:28:33.801177+00 |
| description | Zero-dependency parallel library with work stealing, SIMD, lock-free operations, adaptive execution, and memory-efficient algorithms |
| homepage | https://avila.inc |
| repository | https://github.com/avilaops/arxis |
| max_upload_size | |
| id | 1989249 |
| size | 255,561 |
A zero-dependency parallel computation library for Rust with true parallel execution and advanced performance features.
std::thread::scopestd::thread, std::sync)Add to your Cargo.toml:
[dependencies]
avx-parallel = "0.4.0"
use avx_parallel::prelude::*;
fn main() {
// Parallel iteration
let data = vec![1, 2, 3, 4, 5];
let sum: i32 = data.par_iter()
.map(|x| x * 2)
.sum();
println!("Sum: {}", sum); // Sum: 30
// High-performance par_vec API
let results: Vec<i32> = data.par_vec()
.map(|&x| x * x)
.collect();
println!("{:?}", results); // [1, 4, 9, 16, 25]
// Lock-free counting (v0.4.0)
let count = lockfree_count(&data, |x| x > &2);
println!("Count: {}", count); // Count: 3
}
map - Transform each elementfilter - Keep elements matching predicatecloned - Clone elements (for reference iterators)sum - Sum all elementsreduce - Reduce with custom operationfold - Fold with identity and operationcount - Count elements matching predicatefind_any - Find any element matching predicateall - Check if all elements matchany - Check if any element matchesparallel_sort - Parallel merge sortparallel_sort_by - Sort with custom comparatorparallel_zip - Combine two slices element-wiseparallel_chunks - Process data in fixed-size chunkspartition - Split into two vectors based on predicatework_stealing_map - Map with dynamic load balancingWorkStealingPool - Thread pool with work stealingsimd_sum_* - SIMD-accelerated sum operationssimd_dot_* - SIMD dot productThreadPoolConfig - Advanced thread pool configurationlockfree_count - Atomic-based counting without lockslockfree_any / lockfree_all - Lock-free search with early exitAdaptiveExecutor - Learning executor that optimizes chunk sizesspeculative_execute - Auto-select parallel vs sequentialcache_aware_map - Cache-line optimized transformationsparallel_transform_inplace - Zero-allocation transformationsThe library automatically:
| Operation | Dataset | Sequential | Parallel (v0.3.0) | Parallel (v0.4.0) | Speedup |
|---|---|---|---|---|---|
| Sum | 1M | 2.5ms | 1.1ms | 0.9ms | 2.78x |
| Filter | 1M | 45ms | 15ms | 12ms | 3.75x |
| Count (lock-free) | 1M | 8ms | 4ms | 2.5ms | 3.20x |
| Sort | 1M | 82ms | 25ms | 25ms | 3.28x |
| Complex Compute | 100K | 230ms | 75ms | 65ms | 3.54x |
Note: For simple operations (<100Β΅s per element), sequential may be faster due to thread overhead.
use avx_parallel::prelude::*;
let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Lock-free counting with atomics
let count = lockfree_count(&data, |x| x > &5);
// Lock-free search with early exit
let has_large = lockfree_any(&data, |x| x > &100);
let all_positive = lockfree_all(&data, |x| x > &0);
use avx_parallel::adaptive::AdaptiveExecutor;
// Executor learns optimal chunk size over time
let mut executor = AdaptiveExecutor::new();
// First run: learns optimal parameters
let result1 = executor.execute(&data, |x| expensive_op(x));
// Subsequent runs: uses learned optimal chunk size
let result2 = executor.execute(&data, |x| expensive_op(x));
use avx_parallel::memory::parallel_transform_inplace;
// Zero-allocation in-place transformation
let mut data = vec![1, 2, 3, 4, 5];
parallel_transform_inplace(&mut data, |x| *x *= 2);
// data is now [2, 4, 6, 8, 10] without any allocations
use avx_parallel::{work_stealing_map, WorkStealingPool};
// Dynamic load balancing
let data = vec![1, 2, 3, 4, 5];
let results = work_stealing_map(&data, |x| expensive_computation(x));
// Custom work stealing pool
let pool = WorkStealingPool::new(4);
pool.execute(tasks);
use avx_parallel::simd;
let data: Vec<i32> = (1..=1_000_000).collect();
let sum = simd::parallel_simd_sum_i32(&data);
let a: Vec<f32> = vec![1.0, 2.0, 3.0];
let b: Vec<f32> = vec![4.0, 5.0, 6.0];
let dot = simd::simd_dot_f32(&a, &b);
use avx_parallel::{ThreadPoolConfig, set_global_config};
let config = ThreadPoolConfig::new()
.num_threads(8)
.min_chunk_size(2048)
.thread_name("my-worker");
set_global_config(config);
use avx_parallel::parallel_sort;
let mut data = vec![5, 2, 8, 1, 9];
parallel_sort(&mut data);
// data is now [1, 2, 5, 8, 9]
use avx_parallel::executor::*;
let data = vec![1, 2, 3, 4, 5];
// Parallel map
let results = parallel_map(&data, |x| x * 2);
// Parallel filter
let evens = parallel_filter(&data, |x| *x % 2 == 0);
// Parallel reduce
let sum = parallel_reduce(&data, |a, b| a + b);
// Parallel partition
let (evens, odds) = parallel_partition(&data, |x| *x % 2 == 0);
// Find first matching
let found = parallel_find(&data, |x| *x > 3);
// Count matching
let count = parallel_count(&data, |x| *x % 2 == 0);
use avx_parallel::prelude::*;
let mut data = vec![1, 2, 3, 4, 5];
data.par_iter_mut()
.for_each(|x| *x *= 2);
println!("{:?}", data); // [2, 4, 6, 8, 10]
std::thread::scope for lifetime-safe thread spawningstd::thread::available_parallelism()Arc<Mutex<>> for safe result collectionDefault Configuration:
const MIN_CHUNK_SIZE: usize = 1024; // Optimized based on benchmarks
const MAX_CHUNKS_PER_THREAD: usize = 8;
Environment Variables:
# Customize minimum chunk size (useful for tuning specific workloads)
export avx_MIN_CHUNK_SIZE=2048
# Run your program
cargo run --release
When to Adjust:
use avx_parallel::prelude::*;
let data: Vec<i32> = (0..10_000_000).collect();
// Perform expensive computation in parallel
let results = data.par_vec()
.map(|&x| {
// Simulate expensive operation
let mut result = x;
for _ in 0..100 {
result = (result * 13 + 7) % 1_000_000;
}
result
})
.collect();
use avx_parallel::prelude::*;
let data: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0, 5.0];
// Calculate statistics in parallel
let sum: f64 = data.par_iter().sum();
let count = data.len();
let mean = sum / count as f64;
let variance = data.par_vec()
.map(|&x| (x - mean).powi(2))
.into_iter()
.sum::<f64>() / count as f64;
git clone https://github.com/your-org/avx-parallel
cd avx-parallel
cargo build --release
cargo test
MIT License - see LICENSE file for details
Contributions are welcome! Please feel free to submit a Pull Request.
Full API documentation is available at docs.rs/avx-parallel
If you find this project useful, consider giving it a star!