kv-cache

Crates.iokv-cache
lib.rskv-cache
version0.1.0
created_at2025-08-01 06:56:16.317837+00
updated_at2025-08-01 06:56:16.317837+00
descriptionLLM inference in Rust
homepage
repositoryhttps://github.com/zTgx/llama.rust
max_upload_size
id1776195
size2,333
(zTgx)

documentation

README

llama.rust

LLM inference in Rust

Inference

cargo run --release -p llama-rust -- --model "meta-llama/Llama-2-7b-hf" \
    --prompt "What is the capital of France?" --max-tokens 20 --temperature 0.7 --cpu

Parameters

  • --model: Path to the Hugging Face model ID.
  • --prompt: The prompt to use for inference.
  • --max-tokens: The maximum number of tokens to generate.
  • --temperature: The temperature to use for sampling.

References

This project draws inspiration from:

Commit count: 0

cargo fmt