litert-lm-rs

Crates.io	litert-lm-rs
lib.rs	litert-lm-rs
version	0.1.0
created_at	2025-11-04 23:12:18.525307+00
updated_at	2025-11-04 23:12:18.525307+00
description	Rust bindings for LiteRT-LM C API - Engine, Session, and Conversation support
homepage
repository	https://github.com/maceip/litert-lm-rs
max_upload_size
id	1917109
size	50,041

mac (maceip)

documentation

README

litert-lm-rs

Safe, idiomatic Rust bindings for the LiteRT-LM C API.

Quick Start

# 1. Clone and build LiteRT-LM C library
git clone https://github.com/google-ai-edge/LiteRT-LM
cd LiteRT-LM
bazel build //c:engine

# 2. Clone this repo
cd ..
git clone https://github.com/maceip/litert-lm-rs
cd litert-lm-rs

# 3. Build and run example
export LD_LIBRARY_PATH=../LiteRT-LM/bazel-bin/c:$LD_LIBRARY_PATH
cargo run --example simple_chat model.tflite

About

LiteRT-LM is Google's lightweight runtime for on-device large language models. This crate provides Rust bindings that allow you to use LiteRT-LM from Rust applications with a safe, ergonomic API.

Upstream Repository: https://github.com/google-ai-edge/LiteRT-LM

Use Cases

On-device AI applications: Run language models locally without cloud dependencies
Privacy-focused apps: Keep user data on-device with local inference
Embedded systems: Deploy LLMs on resource-constrained devices
Rust applications: Integrate LiteRT-LM into Rust-based services and tools
Cross-platform development: Build portable AI applications for Linux and macOS

Interfaces

Engine

The primary interface for loading and managing a language model. An Engine owns the loaded model and can create multiple sessions.

API:

Engine::new(model_path, backend) - Load a .tflite model file with CPU or GPU backend
engine.create_session() - Create a new conversation session

Use when: You need to load a model once and create multiple independent conversation contexts.

Session

Represents a stateful conversation context. Each session maintains its own history and can generate text responses.

API:

session.generate(prompt) - Generate text from a prompt
session.get_benchmark_info() - Get performance metrics

Use when: You need to maintain conversation history or generate multiple related responses.

Backend

Specifies the hardware backend for model execution.

Options:

Backend::Cpu - CPU-based inference
Backend::Gpu - GPU-accelerated inference (if available)

Features

Automatic FFI generation: Uses bindgen to auto-generate bindings from c/engine.h
Safe API: Memory-safe Rust wrappers around C FFI
Thread-safe: Engine and Session types implement Send/Sync where appropriate
Idiomatic Rust: Result-based error handling, RAII resource management
Portable: Works on Linux and macOS with simple build process
Zero maintenance: Bindings stay in sync with C API automatically

Building

Prerequisites

This crate requires the LiteRT-LM C library to be built first.

Clone LiteRT-LM:

git clone https://github.com/google-ai-edge/LiteRT-LM
cd LiteRT-LM

Build the C API library:
```
bazel build //c:engine
```
This creates:
- bazel-bin/c/libengine.so (Linux)
- bazel-bin/c/libengine.dylib (macOS)

Build the Rust bindings

cargo build --release

The build script will automatically:

Run bindgen to generate FFI bindings from c/engine.h
Link against bazel-bin/c/libengine.so

Runtime library path

When running your application, ensure the dynamic linker can find libengine.so:

export LD_LIBRARY_PATH=/path/to/LiteRT-LM/bazel-bin/c:$LD_LIBRARY_PATH

Or build with rpath:

RUSTFLAGS="-C link-args=-Wl,-rpath,/path/to/LiteRT-LM/bazel-bin/c" cargo build

Usage

Basic example

use litert_lm::{Engine, Backend};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load model
    let engine = Engine::new("model.tflite", Backend::Cpu)?;

    // Create conversation session
    let session = engine.create_session()?;

    // Generate text
    let response = session.generate("Hello, how are you?")?;
    println!("Response: {}", response);

    Ok(())
}

Interactive chat example

use litert_lm::{Engine, Backend};
use std::io::{self, Write};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let engine = Engine::new("model.tflite", Backend::Cpu)?;
    let session = engine.create_session()?;

    loop {
        print!("You: ");
        io::stdout().flush()?;

        let mut input = String::new();
        io::stdin().read_line(&mut input)?;

        if input.trim().eq_ignore_ascii_case("quit") {
            break;
        }

        match session.generate(input.trim()) {
            Ok(response) => println!("Assistant: {}", response),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Running examples

# Simple interactive chat
cargo run --example simple_chat -- /path/to/model.tflite

# Batch inference
cargo run --example batch_inference -- /path/to/model.tflite

API Overview

Engine

The Engine is the main entry point for loading and managing a language model.

let engine = Engine::new("model.tflite", Backend::Cpu)?;

Methods:

new(model_path: &str, backend: Backend) -> Result<Engine> - Create engine from model file
create_session(&self) -> Result<Session> - Create a new conversation session

Session

A Session represents a conversation context with history.

let session = engine.create_session()?;
let response = session.generate("Hello!")?;

Methods:

generate(&self, prompt: &str) -> Result<String> - Generate response to prompt
get_benchmark_info(&self) -> Result<BenchmarkInfo> - Get performance metrics

Backend

pub enum Backend {
    Cpu,  // CPU backend
    Gpu,  // GPU backend (if available)
}

Error Handling

All operations return Result<T, Error> for proper error handling:

match session.generate("prompt") {
    Ok(response) => println!("Success: {}", response),
    Err(e) => eprintln!("Error: {}", e),
}

Memory Management

The wrapper uses RAII (Resource Acquisition Is Initialization) for automatic cleanup:

Engine automatically calls litert_lm_engine_delete on drop
Session automatically calls litert_lm_session_delete on drop
Generated strings are automatically freed

No manual memory management required.

Thread Safety

Engine: Implements Send + Sync - can be shared between threads
Session: Implements Send - can be moved between threads, but not shared

Build Process Details

The build process is designed to be simple and portable:

build.rs runs bindgen:
- Reads c/engine.h from LiteRT-LM
- Generates Rust FFI bindings
- Writes to $OUT_DIR/bindings.rs
Links against minimal dependencies:
- libengine.so (or .dylib on macOS)
- C++ standard library (stdc++ on Linux, c++ on macOS)
- No need to manually link dozens of libraries
Your code uses the safe wrapper:
- src/lib.rs provides safe Rust API
- Automatically includes generated bindings
- Users never see unsafe code

Troubleshooting

"cannot find -lengine"

The C API library is not in your library path.

Solution: Build the C library first:

cd /path/to/LiteRT-LM
bazel build //c:engine

"cannot find libengine.so at runtime"

At runtime, the dynamic linker can't find the library.

Solution: Set LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/path/to/LiteRT-LM/bazel-bin/c:$LD_LIBRARY_PATH

Or build with rpath:

RUSTFLAGS="-C link-args=-Wl,-rpath,/path/to/bazel-bin/c" cargo build

License

Apache-2.0

Contributing

Contributions welcome! Please ensure:

Code follows Rust conventions (cargo fmt)
All tests pass (cargo test)
New features include tests and documentation

Commit count: 0

litert-lm-rs

documentation

README

litert-lm-rs

Quick Start

About

Use Cases

Interfaces

Engine

Session

Backend

Features

Building

Prerequisites

Build the Rust bindings

Runtime library path

Usage

Basic example

Interactive chat example

Running examples

API Overview

Engine

Session

Backend

Error Handling

Memory Management

Thread Safety

Build Process Details

Troubleshooting

"cannot find -lengine"

"cannot find libengine.so at runtime"

License

Contributing

cargo fmt