kv-cache

Crates.io	kv-cache
lib.rs	kv-cache
version	0.1.0
created_at	2025-08-01 06:56:16.317837+00
updated_at	2025-08-01 06:56:16.317837+00
description	LLM inference in Rust
homepage
repository	https://github.com/zTgx/llama.rust
max_upload_size
id	1776195
size	2,333

(zTgx)

documentation

README

llama.rust

LLM inference in Rust

Inference

cargo run --release -p llama-rust -- --model "meta-llama/Llama-2-7b-hf" \
    --prompt "What is the capital of France?" --max-tokens 20 --temperature 0.7 --cpu

Parameters

--model: Path to the Hugging Face model ID.
--prompt: The prompt to use for inference.
--max-tokens: The maximum number of tokens to generate.
--temperature: The temperature to use for sampling.

References

This project draws inspiration from:

llama.cpp
llm
mistral.rs
candle

Commit count: 0

kv-cache

documentation

README

llama.rust

Inference

Parameters

References

cargo fmt