lmonade-models

Crates.io	lmonade-models
lib.rs	lmonade-models
version	0.1.0-alpha.2
created_at	2025-08-20 14:09:29.522088+00
updated_at	2025-08-21 22:06:43.078102+00
description	LLM model architectures and serving components for the Lmonade inference engine
homepage
repository	https://jgok76.gitea.cloud/femtomc/lmonade
max_upload_size
id	1803468
size	153,082

McCoy R. Becker (femtomc)

documentation

README

lmonade-models

Core model architectures and serving components for the Lmonade inference engine.

Overview

This crate provides:

Model architectures (currently TinyLlama)
Tensor operations and components (attention, feedforward, normalization)
Serving infrastructure (paged KV cache, block management)
Weight loading from SafeTensors and GGUF formats
Batching strategies for inference

Key Components

Models: Architecture implementations (src/models/)
Components: Building blocks like attention and feedforward layers (src/components/)
Formats: Weight loading and model configuration (src/formats/)
Serving: Production serving infrastructure (src/serving/)
- Paged attention and KV cache management
- Continuous batching for throughput optimization
- Memory block management

Usage

use lmonade_models::models::tinyllama::TinyLlamaModel;
use lmonade_models::formats::config::ModelConfig;

// Load model configuration
let config = ModelConfig::from_file("path/to/config.json")?;

// Initialize model
let model = TinyLlamaModel::new(&config)?;

Documentation

For detailed API documentation and architectural details, see:

Status

This crate is under active development. TinyLlama inference is partially working with ongoing optimizations for performance and accuracy.

Commit count: 0

lmonade-models

documentation

README

lmonade-models

Overview

Key Components

Usage

Documentation

Status

cargo fmt