Crates.io | lmonade |
lib.rs | lmonade |
version | 0.1.0-alpha.2 |
created_at | 2025-08-20 14:13:31.939927+00 |
updated_at | 2025-08-21 22:08:24.779015+00 |
description | LLM inference engine - main crate with CLI and re-exports |
homepage | |
repository | https://jgok76.gitea.cloud/femtomc/lmonade |
max_upload_size | |
id | 1803474 |
size | 152,861 |
An LLM inference engine built in Rust with an actor-based architecture.
# Clone and build
git clone https://jgok76.gitea.cloud/femtomc/lmonade.git
cd lmonade
cargo build --release
# The CLI will be available at ./target/release/lmonade
# Download a model
lmonade model download TinyLlama/TinyLlama-1.1B-Chat-v1.0
# Chat with the model
lmonade chat "Hello, how are you today?"
# Stream responses in real-time
lmonade chat --stream "Tell me a story about space"
# Use a specific model
lmonade chat --model TinyLlama-1.1B-Chat-v1.0 "Explain quantum computing"
# Start the API server (run a lmonade stand!)
lmonade stand
# List downloaded models
lmonade model list
# Show model information
lmonade model info TinyLlama-1.1B-Chat-v1.0
# Get help
lmonade --help
lmonade chat --help
Start the OpenAI-compatible API server:
# Start server (default port 8080)
lmonade serve
# Custom configuration
lmonade stand --host 0.0.0.0 --port 3000 --model TinyLlama-1.1B-Chat-v1.0
# Or build and run the binary directly
cargo build --release
./target/release/lmonade stand
Make requests to the API:
# Chat completion
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "TinyLlama-1.1B-Chat-v1.0",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Streaming
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "TinyLlama-1.1B-Chat-v1.0",
"messages": [{"role": "user", "content": "Tell me a joke"}],
"stream": true
}'
Guide | Description |
---|---|
Getting Started | Installation and first steps |
CLI Guide | Complete CLI reference |
API Reference | HTTP API documentation |
Architecture | System design and internals |
Development | Contributing and extending |
Model | Size | Status | Notes |
---|---|---|---|
TinyLlama-1.1B-Chat-v1.0 | 1.1B | Ready | Optimized for chat |
Llama 2 | 7B-70B | In Progress | Coming soon |
Mistral | 7B | Planned | Q1 2025 |
Mixtral | 8x7B | Planned | MoE support |
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="TinyLlama-1.1B-Chat-v1.0",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end='')
const response = await fetch('http://localhost:8080/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'TinyLlama-1.1B-Chat-v1.0',
messages: [{ role: 'user', content: 'Hello!' }],
stream: false
})
});
const data = await response.json();
console.log(data.choices[0].message.content);
# Development build
cargo build
# Optimized release build
cargo build --release
# Run tests
cargo test
# Run with debug logging
RUST_LOG=debug cargo run --bin lmonade chat "Hello"
lmonade/
├── lmonade/ # CLI application
├── lmonade-models/ # Model implementations
├── lmonade-runtime/ # Actor-based runtime
├── lmonade-server/ # HTTP API server
└── docs/ # Documentation
├── getting-started/ # Installation & setup
├── cli/ # CLI documentation
├── api/ # API reference
├── architecture/ # System design
└── development/ # Developer guides
We welcome contributions! See CONTRIBUTING.md for guidelines.
Key areas for contribution:
GPL v3.0 - See LICENSE for details.
Built with:
Status: Beta - Core features working, optimizations ongoing