tllama

Crates.io	tllama
lib.rs	tllama
version	0.1.1
created_at	2025-09-30 10:29:49.505957+00
updated_at	2025-09-30 10:29:49.505957+00
description	Lightweight Local LLM Inference Engine
homepage
repository	https://github.com/moyanj/tllama
max_upload_size
id	1860965
size	170,962

MoYan (moyanj)

documentation

https://docs.rs/crate/tllama

README

🧠 Tllama

🚀 Lightweight Local LLM Inference Engine

Tllama is a Rust-based open-source LLM engine designed for efficient local execution. It features a command-line interface and OpenAI-compatible API for seamless model interaction.

🚀 Key Features

🔍 Smart model detection
🤝 Full OpenAI API compatibility
⚡ Blazing-fast startup (<0.5s)
📦 Ultra-compact binary (<20MB)

📦 Installation

Script install

curl -sSL https://raw.githubusercontent.com/moyanj/tllama/main/install.sh | bash

Cargo install

cargo install tllama

Pre-built binaries

Download from Releases

🧪 Usage Guide

Discover models

tllama discover [--all]

Text generation

tllama infer <model_path> "<prompt>" \
  --n-len <tokens> \          # Output length (default: 128)
  --temperature <value> \     # Randomness (0-1)
  --top-k <value> \           # Top-k sampling
  --repeat-penalty <value>    # Repetition penalty

Example:

tllama infer ./llama3-8b.gguf "The future of AI is" \
  --temperature 0.7 \
  --n-len 256

Interactive chat

tllama chat <model_path>

Start API server

tllama serve \
  --host 0.0.0.0 \   # Bind address (default)
  --port 8080        # Port (default)

Chat API Example:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Explain Rust's memory safety"}
    ],
    "temperature": 0.7,
    "max_tokens": 200
  }'

📅 Development Roadmap

Core CLI implementation
GGUF quantized model support
Model auto-download & caching
Web UI integration
Comprehensive test suite

🙌 Contributing

PRs welcome! See CONTRIBUTING.md for guidelines

🔐 License

MIT License

✨ Design Philosophy

Terminal-first: Optimized for CLI workflows with 10x faster startup than Ollama Minimal footprint: Single binary under 5MB, zero external dependencies Seamless integration: Compatible with OpenAI SDKs and LangChain

📬 Contact

GitHub: moyanj/tllama
Issues: Report bugs
Feature requests: Open discussion issue

⭐ Star us on GitHub to show your support!

Commit count: 0