hodu_cuda_kernels

Crates.io	hodu_cuda_kernels
lib.rs	hodu_cuda_kernels
version	0.2.4
created_at	2025-11-10 18:58:51.391914+00
updated_at	2025-11-13 22:01:14.058028+00
description	hodu cuda kernels
homepage
repository	https://github.com/hodu-rs/hodu
max_upload_size
id	1925945
size	520,033

Han Damin (miniex)

documentation

README

hodu_cuda_kernels

High-performance CUDA kernels for tensor operations on NVIDIA GPUs.

cuBLAS Integration

Supported Operations

matmul: Batched matrix multiplication with GEMM
dot: 2D matrix multiplication with GEMM

Supported Data Types

bf16: BFloat16 (compute in FP32, I/O in BF16)
f16: Float16/Half (compute in FP32, I/O in FP16)
f32: Float32 (native precision)
f64: Float64 (native precision)

Features

Automatic fallback to custom CUDA kernels for unsupported types or non-contiguous matrices
Handles non-contiguous matrices via leading dimension parameters
Transparent row-major to column-major layout conversion

Commit count: 0

cargo fmt