| Crates.io | tllama |
| lib.rs | tllama |
| version | 0.1.1 |
| created_at | 2025-09-30 10:29:49.505957+00 |
| updated_at | 2025-09-30 10:29:49.505957+00 |
| description | Lightweight Local LLM Inference Engine |
| homepage | |
| repository | https://github.com/moyanj/tllama |
| max_upload_size | |
| id | 1860965 |
| size | 170,962 |
๐ Lightweight Local LLM Inference Engine
Tllama is a Rust-based open-source LLM engine designed for efficient local execution. It features a command-line interface and OpenAI-compatible API for seamless model interaction.
curl -sSL https://raw.githubusercontent.com/moyanj/tllama/main/install.sh | bash
cargo install tllama
Download from Releases
tllama discover [--all]
tllama infer <model_path> "<prompt>" \
--n-len <tokens> \ # Output length (default: 128)
--temperature <value> \ # Randomness (0-1)
--top-k <value> \ # Top-k sampling
--repeat-penalty <value> # Repetition penalty
Example:
tllama infer ./llama3-8b.gguf "The future of AI is" \
--temperature 0.7 \
--n-len 256
tllama chat <model_path>
tllama serve \
--host 0.0.0.0 \ # Bind address (default)
--port 8080 # Port (default)
Chat API Example:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain Rust's memory safety"}
],
"temperature": 0.7,
"max_tokens": 200
}'
PRs welcome! See CONTRIBUTING.md for guidelines
MIT License
Terminal-first: Optimized for CLI workflows with 10x faster startup than Ollama Minimal footprint: Single binary under 5MB, zero external dependencies Seamless integration: Compatible with OpenAI SDKs and LangChain
โญ Star us on GitHub to show your support!