| Crates.io | runmat-accelerate |
| lib.rs | runmat-accelerate |
| version | 0.2.8 |
| created_at | 2025-10-18 18:01:44.896528+00 |
| updated_at | 2025-12-22 21:41:20.086317+00 |
| description | Pluggable GPU acceleration layer for RunMat (CUDA, ROCm, Metal, Vulkan/Spir-V) |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1889475 |
| size | 1,820,526 |
runmat-accelerate provides the high-level acceleration layer that integrates GPU backends with the language runtime. It implements provider(s) for runmat-accelerate-api so that gpuArray, gather, and accelerated math and linear algebra can execute on devices transparently where appropriate.
runmat-accelerate-api to register an AccelProvider implementation at startup.wgpu, cuda, rocm, metal, vulkan, opencl) are feature-gated. Only one provider is registered globally, but a future multi-device planner can fan out.Planner decides when to run ops on CPU vs GPU (size thresholds, op types, fusion opportunities). Accelerator exposes ergonomic entry points used by the runtime or higher layers.Accelerator with automatic CPU/GPU routing: the planner chooses CPU path (delegating to runmat-runtime) or GPU path (via provider methods) based on tensor sizes, operation types, and fusion opportunities.gpuArray/gather: when a provider is registered, runtime builtins route through the provider API defined in runmat-accelerate-api.gpuArray, gather) live in runmat-runtime for consistency with all other builtins. They call into runmat-accelerate-api::provider(), which is implemented and registered by this crate.wgpu (feature: wgpu) is the primary cross-vendor backend, providing support for Metal (macOS), DirectX 12 (Windows), and Vulkan (Linux) through a single portable implementation.The provider is registered at process startup (REPL/CLI/app). Once registered, MATLAB-like code can use:
G = gpuArray(A); % move tensor to device
H = G + 2; % elementwise add (planner may choose GPU path)
R = gather(H); % bring results back to host
RunMat Accelerate powers native acceleration for RunMat. It lets users avoid needing to use the gpuArray/gather builtins and instead use the native acceleration API:
% Example: Large matrix multiplication and elementwise operations
A = randn(10000, 10000); % Large matrix
B = randn(10000, 10000);
% Normally, in MATLAB you'd need to explicitly use gpuArray:
% G = gpuArray(A);
% H = G .* B; % Elementwise multiply on GPU
% S = sum(H, 2);
% R = gather(S);
% With RunMat Accelerate, the planner can transparently move data to the GPU
% and back as needed, so you can just write:
H = A .* B; % Planner may choose GPU for large ops
S = sum(H, 2); % Fused and executed on device if beneficial
% Results are automatically brought back to host as needed
disp(S(1:10)); % Print first 10 results
% The planner/JIT will optimize transfers and fuse operations for best performance.
gpuDevice() returns a structured value with details about the active provider/device when available. Fields include device_id, name, vendor, optional memory_bytes, and optional backend.info = gpuDevice();
% Example output (in-process provider):
% struct with fields:
% device_id: 0
% name: 'InProcess'
% vendor: 'RunMat'
% backend: 'inprocess'
RUNMAT_TWO_PASS_THRESHOLD=<usize>RUNMAT_REDUCTION_WG=<u32>reduce_len <= threshold, otherwise two-pass (partials + second-stage reduce).wg=0 to use provider defaults.wgpu_profile binary:
cargo run -p runmat-accelerate --bin wgpu_profile --features wgpu -- --only-reduce-sweep --reduce-sweep --quickRUNMAT_TWO_PASS_THRESHOLD=1000000 cargo run -p runmat-accelerate --bin wgpu_profile --features wgpu -- --only-reduce-sweep --reduce-sweepRUNMAT_TWO_PASS_THRESHOLD=1 cargo run -p runmat-accelerate --bin wgpu_profile --features wgpu -- --only-reduce-sweep --reduce-sweeprunmat_accelerate::provider_cache_stats() (when built with wgpu) and are printed by wgpu_profile after a sweep..json) and shader WGSL (.wgsl) are persisted under the OS cache directory or RUNMAT_PIPELINE_CACHE_DIR (fallback target/tmp).runmat accel-info prints the last warmup duration in milliseconds.