| Crates.io | ruvllm-wasm |
| lib.rs | ruvllm-wasm |
| version | 2.0.0 |
| created_at | 2026-01-22 05:23:36.20332+00 |
| updated_at | 2026-01-22 05:23:36.20332+00 |
| description | WASM bindings for RuvLLM - browser-compatible LLM inference runtime with WebGPU acceleration |
| homepage | |
| repository | https://github.com/ruvnet/ruvector |
| max_upload_size | |
| id | 2060810 |
| size | 400,487 |
WASM bindings for browser-based LLM inference with WebGPU acceleration, SIMD optimizations, and intelligent routing.
Add to your Cargo.toml:
[dependencies]
ruvllm-wasm = "2.0"
Or build for WASM:
wasm-pack build --target web --release
use ruvllm_wasm::{RuvLLMWasm, GenerationConfig};
// Initialize with WebGPU (if available)
let llm = RuvLLMWasm::new(true).await?;
// Load a GGUF model
llm.load_model_from_url("https://example.com/model.gguf").await?;
// Generate text
let config = GenerationConfig {
max_tokens: 100,
temperature: 0.7,
top_p: 0.9,
..Default::default()
};
let result = llm.generate("What is the capital of France?", &config).await?;
println!("{}", result.text);
import init, { RuvLLMWasm } from 'ruvllm-wasm';
await init();
// Create instance with WebGPU
const llm = await RuvLLMWasm.new(true);
// Load model
await llm.load_model_from_url('https://example.com/model.gguf', (loaded, total) => {
console.log(`Loading: ${Math.round(loaded / total * 100)}%`);
});
// Generate with streaming
await llm.generate_stream('Tell me a story', {
max_tokens: 200,
temperature: 0.8,
}, (token) => {
process.stdout.write(token);
});
[dependencies]
ruvllm-wasm = { version = "2.0", features = ["webgpu"] }
Enables GPU-accelerated inference using WebGPU compute shaders:
[dependencies]
ruvllm-wasm = { version = "2.0", features = ["parallel"] }
Run inference in Web Workers:
[dependencies]
ruvllm-wasm = { version = "2.0", features = ["simd"] }
Requires building with SIMD target:
RUSTFLAGS="-C target-feature=+simd128" wasm-pack build --target web
[dependencies]
ruvllm-wasm = { version = "2.0", features = ["intelligent"] }
Enables advanced AI features:
| Feature | Required | Benefit |
|---|---|---|
| WebAssembly | Yes | Core execution |
| WebGPU | No (recommended) | 10-50x faster |
| SharedArrayBuffer | No | Multi-threading |
| SIMD | No | 2-4x faster math |
Add these headers to your server:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
| Model | Size | Use Case |
|---|---|---|
| TinyLlama-1.1B-Q4 | ~700 MB | General chat |
| Phi-2-Q4 | ~1.6 GB | Code, reasoning |
| Qwen2-0.5B-Q4 | ~400 MB | Fast responses |
| StableLM-Zephyr-3B-Q4 | ~2 GB | Quality chat |
impl RuvLLMWasm {
/// Create a new instance
pub async fn new(use_webgpu: bool) -> Result<Self, JsValue>;
/// Load model from URL
pub async fn load_model_from_url(&self, url: &str) -> Result<(), JsValue>;
/// Load model from bytes
pub async fn load_model_from_bytes(&self, bytes: &[u8]) -> Result<(), JsValue>;
/// Generate text completion
pub async fn generate(&self, prompt: &str, config: &GenerationConfig) -> Result<GenerationResult, JsValue>;
/// Generate with streaming callback
pub async fn generate_stream(&self, prompt: &str, config: &GenerationConfig, callback: js_sys::Function) -> Result<GenerationResult, JsValue>;
/// Check WebGPU availability
pub async fn check_webgpu() -> WebGPUStatus;
/// Get browser capabilities
pub async fn get_capabilities() -> BrowserCapabilities;
/// Unload model and free memory
pub fn unload(&self);
}
MIT OR Apache-2.0
Part of the RuVector ecosystem - High-performance vector database with self-learning capabilities.