easy-async-opencl3

Crates.io	easy-async-opencl3
lib.rs	easy-async-opencl3
version	0.2.4
created_at	2026-01-19 00:31:01.154436+00
updated_at	2026-01-25 17:06:09.23831+00
description	A declarative, multi-device asynchronous executor for OpenCL based on cl3.
homepage
repository	https://github.com/wayavl/easy-async-cl3
max_upload_size
id	2053381
size	234,455

(Wayavl)

documentation

README

easy-async-cl3

A high-level, async-first Rust wrapper for OpenCL with intelligent multi-device management and declarative task execution.

Overview

easy-async-cl3 provides a modern, ergonomic interface to OpenCL that embraces Rust's async/await paradigm. The library automatically manages resources, distributes work across multiple devices, and provides compile-time safety guarantees.

Key Features

Async/Await Integration: All GPU operations return futures for seamless async workflows
Automatic Multi-Device Support: Intelligent work distribution across multiple GPUs based on device capabilities
Type-Safe API: Compile-time guarantees prevent common errors (e.g., using unbuilt programs)
Declarative Task Building: Fluent builder pattern for constructing GPU tasks
Zero-Cost Abstractions: RAII-based resource management with no runtime overhead
Comprehensive OpenCL Support: Full support for OpenCL 1.1 through 3.0 features including Pipes, SVM, and Images
Built-in Profiling: Optional performance measurement with negligible overhead

Installation

Add this to your Cargo.toml:

[dependencies]
easy-async-cl3 = "0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

Quick Start

use easy_async_cl3::{
    async_executor::AsyncExecutor,
    cl_types::memory_flags::MemoryFlags,
    error::ClError,
};

#[tokio::main]
async fn main() -> Result<(), ClError> {
    // Initialize executor with best available platform
    let executor = AsyncExecutor::new_best_platform()?;
    
    // Define and build kernel
    let source = r#"
        kernel void vector_add(global float* a, global const float* b) {
            size_t i = get_global_id(0);
            a[i] += b[i];
        }
    "#;
    let program = executor.build_program(source.to_string(), None)?;
    let kernel = executor.create_kernel(&program, "vector_add")?;
    
    // Prepare data
    let size = 1_000_000;
    let mut a = vec![1.0f32; size];
    let b = vec![2.0f32; size];
    
    // Create GPU buffers
    let buf_a = executor.create_buffer(
        &[MemoryFlags::ReadWrite, MemoryFlags::CopyHostPtr],
        size * std::mem::size_of::<f32>(),
        a.as_mut_ptr() as *mut _
    )?;
    let buf_b = executor.create_buffer(
        &[MemoryFlags::ReadOnly, MemoryFlags::CopyHostPtr],
        size * std::mem::size_of::<f32>(),
        b.as_ptr() as *mut _
    )?;
    
    // Execute task
    executor.create_task(kernel)
        .arg_buffer(0, &buf_a)
        .arg_buffer(1, &buf_b)
        .global_work_dims(size, 1, 1)
        .read_buffer(&buf_a, &mut a)
        .run()
        .await?;
    
    assert_eq!(a[0], 3.0);
    Ok(())
}

Advanced Features

Multi-Device Execution

The library automatically detects and utilizes all available compute devices:

let executor = AsyncExecutor::new_best_platform_with_options(true)?; // Enable profiling

let report = executor.create_task(kernel)
    .arg_buffer(0, &buffer)
    .global_work_dims(10_000_000, 1, 1)
    .run()
    .await?;

println!("Execution time: {} μs", report.total_kernel_duration_ns() / 1000);

Shared Virtual Memory (OpenCL 2.0+)

Zero-copy memory sharing between CPU and GPU:

let mut svm_buffer = executor.create_svm_buffer::<f32>(
    &[MemoryFlags::ReadWrite], 
    1024
)?;

executor.create_task(kernel)
    .arg_svm(0, &svm_buffer)
    .global_work_dims(1024, 1, 1)
    .run()
    .await?;

// Direct CPU access without explicit copy
let queue = &executor.get_queues()[0];
let mapped = svm_buffer.map_mut(queue, &vec![MemoryFlags::ReadWrite])?;
println!("Result: {}", mapped[0]);

Image Processing

Native support for OpenCL images with hardware-accelerated filtering:

use easy_async_cl3::cl_types::cl_image::{
    ClImageFormats, ClImageDesc, image_type::ClImageType
};

let format = ClImageFormats::rgba_unorm_int8();
let desc = ClImageDesc {
    image_type: ClImageType::Image2D,
    image_width: Some(1920),
    image_height: Some(1080),
    ..Default::default()
};

let image = executor.create_image(
    &[MemoryFlags::ReadWrite],
    &format,
    &desc,
    std::ptr::null_mut()
)?;

executor.create_task(kernel)
    .arg_image(0, &image)
    .global_work_dims(1920, 1080, 1)
    .run()
    .await?;

Pipes for Inter-Kernel Communication (OpenCL 2.0+)

Stream data between kernels without CPU involvement:

use easy_async_cl3::cl_types::cl_pipe::ClPipe;

let pipe = ClPipe::new(
    executor.get_context().as_ref(),
    &[MemoryFlags::ReadWrite],
    4,    // packet size (bytes)
    1024  // max packets
)?;

// Producer writes to pipe
executor.create_task(producer_kernel)
    .arg_pipe(0, &pipe)
    .global_work_dims(1024, 1, 1)
    .run()
    .await?;

// Consumer reads from pipe
executor.create_task(consumer_kernel)
    .arg_pipe(0, &pipe)
    .global_work_dims(1024, 1, 1)
    .run()
    .await?;

Architecture

The library is structured in three main layers:

AsyncExecutor: High-level interface managing platforms, devices, and command queues
TaskBuilder: Declarative API for constructing and executing GPU tasks
CL Types: Type-safe wrappers around OpenCL objects (buffers, images, kernels, etc.)

Work is automatically distributed across available devices based on their compute capabilities and memory capacity.

Documentation

API Documentation - Complete API reference
Examples - Comprehensive usage examples
OpenCL Specification - Kernel programming reference

Requirements

Rust 1.70 or later
OpenCL runtime (provided by GPU vendor drivers)
Tokio async runtime

Use Cases

High-performance scientific computing
Real-time image and video processing
Machine learning inference and training
Cryptographic operations
Financial modeling and simulations
Parallel data analytics

Contributing

Contributions are welcome. Please ensure all tests pass and follow the existing code style.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Acknowledgments

Built on the cl3 library.

Commit count: 39