tllama

Crates.iotllama
lib.rstllama
version0.1.1
created_at2025-09-30 10:29:49.505957+00
updated_at2025-09-30 10:29:49.505957+00
descriptionLightweight Local LLM Inference Engine
homepage
repositoryhttps://github.com/moyanj/tllama
max_upload_size
id1860965
size170,962
MoYan (moyanj)

documentation

https://docs.rs/crate/tllama

README

๐Ÿง  Tllama

๐Ÿš€ Lightweight Local LLM Inference Engine

Tllama is a Rust-based open-source LLM engine designed for efficient local execution. It features a command-line interface and OpenAI-compatible API for seamless model interaction.


๐Ÿš€ Key Features

  • ๐Ÿ” Smart model detection
  • ๐Ÿค Full OpenAI API compatibility
  • โšก Blazing-fast startup (<0.5s)
  • ๐Ÿ“ฆ Ultra-compact binary (<20MB)

๐Ÿ“ฆ Installation

Script install

curl -sSL https://raw.githubusercontent.com/moyanj/tllama/main/install.sh | bash

Cargo install

cargo install tllama

Pre-built binaries

Download from Releases


๐Ÿงช Usage Guide

Discover models

tllama discover [--all]

Text generation

tllama infer <model_path> "<prompt>" \
  --n-len <tokens> \          # Output length (default: 128)
  --temperature <value> \     # Randomness (0-1)
  --top-k <value> \           # Top-k sampling
  --repeat-penalty <value>    # Repetition penalty

Example:

tllama infer ./llama3-8b.gguf "The future of AI is" \
  --temperature 0.7 \
  --n-len 256

Interactive chat

tllama chat <model_path>

Start API server

tllama serve \
  --host 0.0.0.0 \   # Bind address (default)
  --port 8080        # Port (default)

Chat API Example:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Explain Rust's memory safety"}
    ],
    "temperature": 0.7,
    "max_tokens": 200
  }'

๐Ÿ“… Development Roadmap

  • Core CLI implementation
  • GGUF quantized model support
  • Model auto-download & caching
  • Web UI integration
  • Comprehensive test suite

๐Ÿ™Œ Contributing

PRs welcome! See CONTRIBUTING.md for guidelines


๐Ÿ” License

MIT License


โœจ Design Philosophy

Terminal-first: Optimized for CLI workflows with 10x faster startup than Ollama Minimal footprint: Single binary under 5MB, zero external dependencies Seamless integration: Compatible with OpenAI SDKs and LangChain


๐Ÿ“ฌ Contact

โญ Star us on GitHub to show your support!

Commit count: 0

cargo fmt