npu-rs

Crates.io	npu-rs
lib.rs	npu-rs
version	0.1.2
created_at	2025-10-20 10:29:35.214678+00
updated_at	2025-10-22 06:50:27.232433+00
description	A NPU driver for RISCV boards
homepage	https://github.com/KushalMeghani1644/NPU-rs
repository	https://github.com/KushalMeghani1644/NPU-rs
max_upload_size
id	1891724
size	132,143

Kushal Meghani (KushalMeghani1644)

documentation

README

NPU Driver for 20 TOPS RISC Board

A Simulation Rust driver for neural processing units on RISC-based boards with 20 TOPS peak performance.

NOTE: *This crate is a simulator Real hardware integration requires HAL implementation and Linux kernel module support. NOTE: I don't own a real RISC board thus this code wasn't tested on real RISCV hardware, please make sure to use at your own risk.

Features

Core Compute

Matrix multiplication (single and batched)
1x1 convolution operations
Multi-dimensional tensor support

Memory Management

Device memory allocation tracking
Memory pool for efficient allocation
Real-time statistics

Power Management

Dynamic voltage and frequency scaling (DVFS)
Thermal monitoring and throttling
Multiple power domains (compute, memory, cache, control)

Performance Analysis

Real-time throughput measurement (GOPS)
Power consumption tracking
Operation-level profiling
Performance metrics collection

Model Optimization

Post-training quantization (INT8)
Graph optimization and fusion
Operator optimization patterns

Device Management

Multi-device support
Device registry
JSON status reporting

Module Overview

tensor: Tensor operations (add, sub, mul, div, relu, sigmoid)

device: Device driver and state management

memory: Memory allocation and tracking

compute: Matrix multiplication and convolution units

execution: Operation execution and scheduling

power: DVFS and thermal management

model: Neural network model definitions

quantization: INT8 quantization and calibration

optimizer: Graph optimization

profiler: Performance profiling

perf_monitor: Real-time metrics

error: Error handling

Download

cargo install npu-rs

Building

cargo build --release

Running

NOTE: THIS CODE RUNS ON CPU ONLY; NO REAL HARDWARE EXECUTION

cargo run                              # Full demo
cargo run --example full_inference_pipeline  # Example pipeline

Device Configuration

Peak Throughput - 20 TOPS Memory - 512 MB Compute Units - 4 Frequency - 400-1000 MHz (via DVFS) Power TDP - 1.2-5.0 W Thermal Limit - 90 C

Usage Example

use npu_rs::{NpuDevice, Tensor, ExecutionContext};
use std::sync::Arc;

let device = Arc::new(NpuDevice::new());
device.initialize()?;

let ctx = ExecutionContext::new(device);
let a = Tensor::random(&[4, 8]);
let b = Tensor::random(&[8, 6]);

let result = ctx.execute_matmul(&a.data, &b.data)?;
println!("Result: {:?}", result.shape());

Design

Type-safe Rust with no unsafe code
Thread-safe using Arc and Mutex
Comprehensive error handling
Documentation comments only (no inline comments)
All modules fully implemented
Production-ready code quality

Build With ♥️ in Rust

Commit count: 0