lmonade-models

Crates.iolmonade-models
lib.rslmonade-models
version0.1.0-alpha.2
created_at2025-08-20 14:09:29.522088+00
updated_at2025-08-21 22:06:43.078102+00
descriptionLLM model architectures and serving components for the Lmonade inference engine
homepage
repositoryhttps://jgok76.gitea.cloud/femtomc/lmonade
max_upload_size
id1803468
size153,082
McCoy R. Becker (femtomc)

documentation

README

lmonade-models

Core model architectures and serving components for the Lmonade inference engine.

Overview

This crate provides:

  • Model architectures (currently TinyLlama)
  • Tensor operations and components (attention, feedforward, normalization)
  • Serving infrastructure (paged KV cache, block management)
  • Weight loading from SafeTensors and GGUF formats
  • Batching strategies for inference

Key Components

  • Models: Architecture implementations (src/models/)
  • Components: Building blocks like attention and feedforward layers (src/components/)
  • Formats: Weight loading and model configuration (src/formats/)
  • Serving: Production serving infrastructure (src/serving/)
    • Paged attention and KV cache management
    • Continuous batching for throughput optimization
    • Memory block management

Usage

use lmonade_models::models::tinyllama::TinyLlamaModel;
use lmonade_models::formats::config::ModelConfig;

// Load model configuration
let config = ModelConfig::from_file("path/to/config.json")?;

// Initialize model
let model = TinyLlamaModel::new(&config)?;

Documentation

For detailed API documentation and architectural details, see:

Status

This crate is under active development. TinyLlama inference is partially working with ongoing optimizations for performance and accuracy.

Commit count: 0

cargo fmt