| Crates.io | litert-lm |
| lib.rs | litert-lm |
| version | 0.2.1 |
| created_at | 2025-11-01 12:53:25.332964+00 |
| updated_at | 2025-11-02 02:00:41.592393+00 |
| description | Rust wrapper for LiteRT-LM providing MCP and OpenAI-compatible interfaces with auto-download, process pools, and streaming |
| homepage | https://github.com/maceip/rlitert-lm |
| repository | https://github.com/maceip/rlitert-lm.git |
| max_upload_size | |
| id | 1911966 |
| size | 292,883 |
Rust wrapper around LiteRT-LM providing MCP and OpenAI-compatible interfaces. Auto-downloads platform binary, manages process pools, exposes model operations as tools.
cargo install litert-lm
[dependencies]
litert-lm = "0.2"
Run as an MCP server to expose LiteRT-LM as tools for AI assistants like Claude Desktop.
litert-lm mcp --transport stdio
Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"litert-lm": {
"command": "/path/to/litert-lm",
"args": ["mcp", "--transport", "stdio"]
}
}
}
litert-lm mcp --transport sse --port 3000
Tools:
list_models - List downloaded or available modelspull_model - Download a model with real-time progressremove_model - Delete a downloaded modelrun_completion - Generate text completionscheck_download_progress - Query download statusResources:
litert://downloads/{model} - Subscribe to live download progress for any model in the registry
Use litert-lm directly in your Rust code for model inference.
use litert_lm::{LitManager, Result};
#[tokio::main]
async fn main() -> Result<()> {
// Initialize manager (auto-downloads lit binary if needed)
let manager = LitManager::new().await?;
// Download a model
println!("Downloading model...");
manager.pull("gemma-3n-E4B", None, None).await?;
// Run inference
let response = manager
.run_completion("gemma-3n-E4B", "What is the capital of France?")
.await?;
println!("Response: {}", response);
Ok(())
}
use litert_lm::{LitManager, Result};
#[tokio::main]
async fn main() -> Result<()> {
let manager = LitManager::new().await?;
// Download with real-time progress callback
manager.pull_with_progress(
"gemma-3n-E4B",
None,
None,
|progress| {
println!("Download progress: {:.1}%", progress);
}
).await?;
// Run completion
let response = manager
.run_completion("gemma-3n-E4B", "Hello!")
.await?;
println!("{}", response);
Ok(())
}
use litert_lm::{LitManager, Result};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<()> {
let manager = LitManager::new().await?;
let mut stream = manager
.run_completion_stream("gemma-3n-E4B", "Tell me a story")
.await?;
while let Some(chunk) = stream.next().await {
print!("{}", chunk?);
}
Ok(())
}
Run an OpenAI-compatible server:
litert-lm serve --port 8080
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-3n-E4B",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'
# List all available models in registry
litert-lm list --show-all
# List downloaded models only
litert-lm list
Some models require a Hugging Face token. Set via environment variable or flag:
export HUGGING_FACE_HUB_TOKEN=hf_your_token
litert-lm pull gemma3-1b
See tests/mcp-tests/ for comprehensive MCP integration tests:
cd tests/mcp-tests
uv run test_mcp_client.py # Basic stdio test
uv run test_mcp_sse.py # SSE transport test
uv run test_mcp_download_quick.py # Download progress test
MIT