| Crates.io | wgpu-algorithms |
| lib.rs | wgpu-algorithms |
| version | 0.1.0 |
| created_at | 2026-01-10 18:32:11.266625+00 |
| updated_at | 2026-01-10 18:32:11.266625+00 |
| description | High-performance, safe wgpu Radix Sort and Prefix Scan for Rust. |
| homepage | |
| repository | https://github.com/samjsui/wgpu-algorithms |
| max_upload_size | |
| id | 2034523 |
| size | 102,778 |
A high-performance, safe WebGPU sorting and scanning library for Rust.
Safe Rust API. All memory management, bind groups, and synchronization are handled internally. No unsafe blocks in library code.
Benchmarks run on Apple M3 Max (Metal backend).
par_sort_unstable)| Items | CPU (Rayon) | GPU (Resident) | GPU (Round-Trip) | Verdict |
|---|---|---|---|---|
| 100k | 0.52 ms | 6.0 ms | 7.2 ms | ❌ CPU Wins (Driver Overhead) |
| 1M | 4.5 ms | 9.1 ms | 10.1 ms | ❌ CPU Wins |
| 10M | 44.1 ms | 31.3 ms | 40.9 ms | ✅ GPU Wins (1.4x) |
| 100M | 506 ms | 273 ms | 407 ms | 🚀 GPU Domination (1.85x) |
Throughput (100M items):
Benchmarks include driver submission overhead (queue.submit + device.poll).
| Items | Time | Throughput | Bandwidth (Effective) |
|---|---|---|---|
| 100 M | 19.2 ms | 5.2 Gelem/s | ~41.6 GB/s |
Note: Bandwidth calculated as Read + Write (4 bytes * 2 * items / time).
The library implements state-of-the-art parallel algorithms tailored for the WebGPU execution model:
VT=8 for Desktop, VT=4 for Mobile) to saturate memory bandwidth.use wgpu_algorithms::{Context, Sorter};
#[tokio::main]
async fn main() {
// 1. Initialize GPU Context
let ctx = Context::init().await.unwrap();
let mut sorter = Sorter::new(&ctx);
// 2. Data
let data: Vec<u32> = (0..50_000_000).map(|_| rand::random()).collect();
// 3. Sort (Auto-selects CPU or GPU based on size)
let sorted = sorter.sort(&data).await;
// 4. Verification
assert!(sorted[0] <= sorted[1]);
}
Add this to your Cargo.toml:
[dependencies]
wgpu-algorithms = "0.1.0"