hodu_cuda_kernels

Crates.iohodu_cuda_kernels
lib.rshodu_cuda_kernels
version0.2.4
created_at2025-11-10 18:58:51.391914+00
updated_at2025-11-13 22:01:14.058028+00
descriptionhodu cuda kernels
homepage
repositoryhttps://github.com/hodu-rs/hodu
max_upload_size
id1925945
size520,033
Han Damin (miniex)

documentation

README

hodu_cuda_kernels

High-performance CUDA kernels for tensor operations on NVIDIA GPUs.

cuBLAS Integration

Supported Operations

  • matmul: Batched matrix multiplication with GEMM
  • dot: 2D matrix multiplication with GEMM

Supported Data Types

  • bf16: BFloat16 (compute in FP32, I/O in BF16)
  • f16: Float16/Half (compute in FP32, I/O in FP16)
  • f32: Float32 (native precision)
  • f64: Float64 (native precision)

Features

  • Automatic fallback to custom CUDA kernels for unsupported types or non-contiguous matrices
  • Handles non-contiguous matrices via leading dimension parameters
  • Transparent row-major to column-major layout conversion
Commit count: 0

cargo fmt