| Crates.io | kv-cache |
| lib.rs | kv-cache |
| version | 0.1.0 |
| created_at | 2025-08-01 06:56:16.317837+00 |
| updated_at | 2025-08-01 06:56:16.317837+00 |
| description | LLM inference in Rust |
| homepage | |
| repository | https://github.com/zTgx/llama.rust |
| max_upload_size | |
| id | 1776195 |
| size | 2,333 |
LLM inference in Rust
cargo run --release -p llama-rust -- --model "meta-llama/Llama-2-7b-hf" \
--prompt "What is the capital of France?" --max-tokens 20 --temperature 0.7 --cpu
--model: Path to the Hugging Face model ID.--prompt: The prompt to use for inference.--max-tokens: The maximum number of tokens to generate.--temperature: The temperature to use for sampling.This project draws inspiration from: