| Crates.io | optirs-gpu |
| lib.rs | optirs-gpu |
| version | 0.1.0 |
| created_at | 2025-09-18 09:38:26.71799+00 |
| updated_at | 2025-12-30 08:25:56.593895+00 |
| description | OptiRS GPU acceleration and multi-GPU optimization |
| homepage | |
| repository | https://github.com/cool-japan/optirs |
| max_upload_size | |
| id | 1844478 |
| size | 1,506,654 |
GPU acceleration and multi-GPU optimization for the OptiRS machine learning optimization library.
OptiRS-GPU provides hardware acceleration for machine learning optimization workloads across multiple GPU backends. This crate enables high-performance, parallel optimization on CUDA, Metal, OpenCL, and WebGPU platforms, with automatic device selection and memory management.
Add this to your Cargo.toml:
[dependencies]
optirs-gpu = "0.1.0"
scirs2-core = "0.1.1" # Required foundation
Enable specific GPU backends:
[dependencies]
optirs-gpu = { version = "0.1.0", features = ["cuda", "metal"] }
Available features:
cuda: NVIDIA CUDA supportmetal: Apple Metal supportopencl: OpenCL supportwgpu: WebGPU support (enabled by default)use optirs_gpu::{GpuOptimizer, DeviceManager};
use optirs_core::optimizers::Adam;
// Initialize GPU device manager
let device_manager = DeviceManager::new().await?;
let device = device_manager.select_best_device()?;
// Create GPU-accelerated optimizer
let mut optimizer = GpuOptimizer::new(device)
.with_optimizer(Adam::new(0.001))
.build()?;
// Your model parameters (automatically transferred to GPU)
let mut params = optimizer.create_tensor(&[1024, 512])?;
let grads = optimizer.create_tensor_from_slice(&gradient_data)?;
// Perform optimization step on GPU
optimizer.step(&mut params, &grads).await?;
use optirs_gpu::{MultiGpuOptimizer, DataParallelStrategy};
// Setup multi-GPU training
let mut multi_gpu = MultiGpuOptimizer::new()
.with_strategy(DataParallelStrategy::AllReduce)
.with_devices(&device_manager.available_devices())
.build()?;
// Distribute model across GPUs
multi_gpu.distribute_model(&model_parameters).await?;
// Synchronized optimization across all GPUs
multi_gpu.step_synchronized(&gradients).await?;
use optirs_gpu::cuda::{CudaContext, CudaStream};
let cuda_ctx = CudaContext::new(device_id)?;
let stream = CudaStream::new(&cuda_ctx)?;
// Custom CUDA kernels
let result = stream.launch_kernel("custom_optimizer", ¶ms, &config).await?;
use optirs_gpu::metal::{MetalDevice, MetalCommandQueue};
let metal_device = MetalDevice::system_default()?;
let command_queue = metal_device.new_command_queue();
// Metal Performance Shaders integration
let mps_optimizer = command_queue.create_mps_optimizer(&config)?;
OptiRS-GPU provides comprehensive error handling for GPU operations:
use optirs_gpu::error::{GpuError, GpuResult};
match optimizer.step(&mut params, &grads).await {
Ok(()) => println!("Optimization successful"),
Err(GpuError::OutOfMemory) => {
// Handle GPU memory exhaustion
optimizer.clear_cache()?;
}
Err(GpuError::DeviceNotAvailable) => {
// Fallback to CPU optimization
fallback_to_cpu_optimizer()?;
}
Err(e) => return Err(e.into()),
}
OptiRS-GPU includes built-in benchmarking tools:
use optirs_gpu::benchmarks::{GpuBenchmark, BenchmarkConfig};
let benchmark = GpuBenchmark::new()
.with_config(BenchmarkConfig::default())
.with_optimizer(optimizer)
.build()?;
let results = benchmark.run_performance_suite().await?;
println!("GPU throughput: {:.2} GFLOPS", results.throughput);
| Platform | CUDA | Metal | OpenCL | WebGPU |
|---|---|---|---|---|
| Linux | ✅ | ❌ | ✅ | ✅ |
| macOS | ❌ | ✅ | ✅ | ✅ |
| Windows | ✅ | ❌ | ✅ | ✅ |
| Web | ❌ | ❌ | ❌ | ✅ |
OptiRS follows the Cool Japan organization's development standards. See the main OptiRS repository for contribution guidelines.
This project is licensed under either of:
at your option.