lmonade-runtime

Crates.io	lmonade-runtime
lib.rs	lmonade-runtime
version	0.1.0-alpha.2
created_at	2025-08-20 14:10:53.345932+00
updated_at	2025-08-21 22:07:16.999729+00
description	Actor-based runtime for LLM inference orchestration and resource management
homepage
repository	https://jgok76.gitea.cloud/femtomc/lmonade
max_upload_size
id	1803469
size	468,440

McCoy R. Becker (femtomc)

documentation

README

lmonade-runtime

High-performance actor-based runtime for LLM inference orchestration.

Overview

This crate provides the core runtime infrastructure for Lmonade:

Actor-based model management using Tokio
Efficient request routing and scheduling
Resource orchestration (GPU/CPU allocation)
Model lifecycle management (loading, caching, eviction)
Concurrent inference with automatic batching

Key Components

Actor System: Erlang-inspired actor model for fault tolerance (src/actor/)
Model Hub: Central orchestrator for multi-model serving (src/actor/model_hub.rs)
Resource Management: GPU allocation and memory management (src/resources/)
LLM Engine: Core inference engine integration (src/llm_engine.rs)
Model Runner: Execution layer for model inference (src/model_runner.rs)

Architecture

The runtime uses an actor-based architecture where:

Each model runs in its own actor with isolated state
The ModelHub orchestrates routing and resource allocation
Supervisors provide fault tolerance and recovery
Message passing ensures thread-safe communication

Usage

use lmonade_runtime::actor::model_hub::ModelHub;
use lmonade_runtime::actor::messages::{LoadModelRequest, GenerateRequest};

// Create and start the model hub
let hub = ModelHub::new(Default::default()).await?;

// Load a model
hub.load_model(LoadModelRequest {
    model_id: "tinyllama".to_string(),
    model_path: "/path/to/model".into(),
    config: Default::default(),
}).await?;

// Generate text
let response = hub.generate(GenerateRequest {
    model_id: "tinyllama".to_string(),
    prompt: "Hello, world!".to_string(),
    params: Default::default(),
}).await?;

Documentation

For detailed documentation:

Status

The runtime is under active development with basic TinyLlama inference working. Production features like distributed serving and advanced scheduling are in progress.

Commit count: 0

lmonade-runtime

documentation

README

lmonade-runtime

Overview

Key Components

Architecture

Usage

Documentation

Status

cargo fmt