swarm-engine-llm

Crates.io	swarm-engine-llm
lib.rs	swarm-engine-llm
version	0.1.0
created_at	2026-01-25 17:51:34.325403+00
updated_at	2026-01-25 17:51:34.325403+00
description	LLM integration backends for SwarmEngine
homepage
repository	https://github.com/ytknishimura/swarm-engine
max_upload_size
id	2069175
size	312,654

Yutaka Nishimura (ynishi)

documentation

README

SwarmEngine

A high-throughput, low-latency agent swarm execution engine written in Rust.

SwarmEngine is designed for running multiple AI agents in parallel with tick-based synchronization, optimized for batch LLM inference and real-time exploration scenarios.

Features

Tick-Driven Architecture: Configurable tick cycles (default 10ms) with deterministic execution
Parallel Agent Execution: Lock-free parallel worker execution using Rayon
Batch LLM Inference: Optimized for batch processing with llama.cpp server, Ollama, and other LLM providers
Exploration Space: Graph-based state exploration with UCB1, Thompson Sampling, and adaptive selection strategies
Offline Learning: Accumulates session data and learns optimal parameters through offline training
Scenario-Based Evaluation: TOML-based scenario definitions with variants support

Performance

Measured on troubleshooting scenario (exploration-based, no per-tick LLM calls):

Metric	Value
Throughput	~80 actions/sec
Tick latency (exploration)	0.1-0.2ms per action
Task completion	5 actions in ~60ms

Note: LLM-based decision making adds latency per call. The exploration-based mode uses graph traversal instead of per-tick LLM calls.

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                            SwarmEngine                               │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │                         Orchestrator                           │  │
│  │                                                                │  │
│  │   Tick Loop:                                                   │  │
│  │   1. Collect Async Results                                     │  │
│  │   2. Manager Phase (LLM Decision / Exploration)                │  │
│  │   3. Worker Execution (Parallel)                               │  │
│  │   4. Merge Results                                             │  │
│  │   5. Tick Advance                                              │  │
│  └────────────────────────────────────────────────────────────────┘  │
│                                                                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────────┐   │
│  │   SwarmState    │  │ ExplorationSpace│  │   BatchInvoker     │   │
│  │  ├─ SharedState │  │  ├─ GraphMap    │  │  ├─ LlamaCppServer │   │
│  │  └─ WorkerStates│  │  └─ Operators   │  │  └─ Ollama         │   │
│  └─────────────────┘  └─────────────────┘  └────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘

Crates

Crate	Description
`swarm-engine-core`	Core runtime, orchestrator, state management, exploration, and learning
`swarm-engine-llm`	LLM integrations (llama.cpp server, Ollama, prompt building, batch processing)
`swarm-engine-eval`	Scenario-based evaluation framework with assertions and metrics
`swarm-engine-ui`	CLI and Desktop GUI (egui)

Quick Start

Prerequisites

Rust 2021 edition or later
llama.cpp (will be built automatically, or use pre-built binary)
A GGUF model file (LFM2.5-1.2B recommended for development)

Installation

# Clone the repository
git clone https://github.com/ynishi/swarm-engine.git
cd swarm-engine

# Build
cargo build --release

Setting up llama-server with LFM2.5

SwarmEngine uses llama.cpp server as the primary LLM backend. LFM2.5-1.2B is the recommended model for development and testing due to its balance of speed and quality.

1. Download the Model

# Using Hugging Face CLI (recommended)
pip install huggingface_hub
huggingface-cli download LiquidAI/LFM2.5-1.2B-Instruct-GGUF \
  LFM2.5-1.2B-Instruct-Q4_K_M.gguf

# Or download directly from Hugging Face:
# https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF

2. Start llama-server

# Start with the downloaded model (using glob pattern for snapshot hash)
cargo run --package swarm-engine-ui -- llama start \
  -m ~/.cache/huggingface/hub/models--LiquidAI--LFM2.5-1.2B-Instruct-GGUF/snapshots/*/LFM2.5-1.2B-Instruct-Q4_K_M.gguf

# With custom options (GPU acceleration, parallel slots)
cargo run --package swarm-engine-ui -- llama start \
  -m ~/.cache/huggingface/hub/models--LiquidAI--LFM2.5-1.2B-Instruct-GGUF/snapshots/*/LFM2.5-1.2B-Instruct-Q4_K_M.gguf \
  --n-gpu-layers 99 \
  --parallel 4 \
  --ctx-size 4096

3. Verify Server Status

# Check if server is running and healthy
cargo run --package swarm-engine-ui -- llama status

# View server logs
cargo run --package swarm-engine-ui -- llama logs -f

# Stop the server
cargo run --package swarm-engine-ui -- llama stop

Why LFM2.5?

Model	Size	Speed	Quality	Use Case
LFM2.5-1.2B	1.2B	Fast	Good	Development, testing (recommended)
Qwen2.5-Coder-3B	3B	Medium	Better	Complex scenarios
Qwen2.5-Coder-7B	7B	Slow	Best	Production quality testing

Running an Evaluation

# Run a troubleshooting scenario
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml -n 5 -v

# With learning data collection
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml -n 5 --learning

CLI Commands

# Show help
cargo run --package swarm-engine-ui -- --help

# Initialize configuration
cargo run --package swarm-engine-ui -- init

# Show current configuration
cargo run --package swarm-engine-ui -- config

# Open scenarios directory
cargo run --package swarm-engine-ui -- open scenarios

# Launch Desktop GUI
cargo run --package swarm-engine-ui -- --gui

Scenarios

Scenarios are defined in TOML format and describe the task, environment, actions, and success criteria:

[meta]
name = "Service Troubleshooting"
id = "user:troubleshooting:v2"
description = "Diagnose and fix a service outage"

[task]
goal = "Diagnose the failing service and restart it"

[llm]
provider = "llama-server"
model = "LFM2.5-1.2B"
endpoint = "http://localhost:8080"

[[actions.actions]]
name = "CheckStatus"
description = "Check the status of services"

[[actions.actions]]
name = "ReadLogs"
description = "Read logs for a specific service"

[app_config]
tick_duration_ms = 10
max_ticks = 150

Scenario Variants

Scenarios can define variants for different configurations:

# List available variants
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml --list-variants

# Run with a specific variant
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml --variant complex

Learning System

SwarmEngine includes a comprehensive learning system with offline parameter optimization and LoRA fine-tuning support.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Learning System                              │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                    Data Collection                           ││
│  │  Eval (--learning) → ActionEvents → Session Snapshots       ││
│  └─────────────────────────────────────────────────────────────┘│
│                              ↓                                   │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                   Offline Analysis                           ││
│  │  learn once → Stats Analysis → OptimalParamsModel           ││
│  │                             → RecommendedPaths               ││
│  └─────────────────────────────────────────────────────────────┘│
│                              ↓                                   │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                    Model Application                         ││
│  │  Next Eval → Load OfflineModel → Apply Parameters           ││
│  │           → LoRA Adapter (optional)                         ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Quick Start

# 1. Collect data with --learning flag
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml -n 30 --learning

# 2. Run offline learning
cargo run --package swarm-engine-ui -- learn once troubleshooting

# 3. Next eval run will automatically use the learned model
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml -n 5 -v
# → "Offline model loaded: ucb1_c=X.XXX, strategy=..."

Model Types

Model	Purpose	Lifetime
ScoreModel	Action selection scores (transitions, N-gram patterns)	1 session
OptimalParamsModel	Parameter optimization (ucb1_c, thresholds)	Cross-session
LoRA Adapter	LLM fine-tuning for decision quality	Persistent

Offline Model Parameters

{
  "parameters": {
    "ucb1_c": 1.414,         // UCB1 exploration constant
    "learning_weight": 0.3,   // Learning weight for selection
    "ngram_weight": 1.0       // N-gram pattern weight
  },
  "strategy_config": {
    "initial_strategy": "ucb1",
    "maturity_threshold": 5,
    "error_rate_threshold": 0.45
  },
  "recommended_paths": [...]   // Optimal action sequences
}

Learning Daemon

For continuous learning during long-running evaluations:

# Start daemon mode (monitors and learns continuously)
cargo run --package swarm-engine-ui -- learn daemon troubleshooting

# Daemon features:
# - Watches for new session data
# - Triggers learning based on configurable conditions
# - Applies learned models via Blue-Green deployment

LoRA Training (Experimental)

Fine-tune LLM for improved decision quality:

# LoRA training requires:
# - Episode data collected from successful runs
# - llama.cpp with LoRA support
# - Training triggers (count, time, or quality-based)

Data Structure

~/.swarm-engine/learning/
├── global_stats.json           # Global statistics across scenarios
└── scenarios/
    └── troubleshooting/        # Per-scenario (learning_key based)
        ├── stats.json          # Accumulated statistics
        ├── offline_model.json  # Learned parameters
        ├── lora/               # LoRA adapters (if trained)
        │   └── v1/
        │       └── adapter.safetensors
        └── sessions/           # Session snapshots
            └── {timestamp}/
                ├── meta.json
                └── stats.json

Selection Strategies

The learning system optimizes selection strategy parameters:

Strategy	Description	When Used
UCB1	Upper Confidence Bound	Early exploration
Thompson	Bayesian sampling	Probabilistic exploration
Greedy	Best known action	Exploitation after learning
Adaptive	Dynamic switching	Production (based on error rate)

LLM Providers

llama-server (Recommended)

llama.cpp server provides true batch processing with continuous batching:

cargo run --package swarm-engine-ui -- llama start \
  -m model.gguf \
  --parallel 4 \
  --ctx-size 4096 \
  --n-gpu-layers 99

Ollama (Alternative)

Ollama can be used but does not support true batch processing:

ollama serve

Note: Ollama processes requests sequentially internally, so throughput measurements may not reflect true parallel performance.

Configuration

Global Configuration (`~/.swarm-engine/config.toml`)

[general]
default_project_type = "eval"

[eval]
default_runs = 30
target_tick_duration_ms = 10

[llm]
default_provider = "llama-server"
cache_enabled = true

[logging]
level = "info"
file_enabled = true

Directory Structure

Path	Purpose
`~/.swarm-engine/`	System configuration, cache, logs
`~/swarm-engine/`	User data: scenarios, reports
`./swarm-engine/`	Project-local configuration

Development

Build and Test

# Type check
cargo check

# Build
cargo build

# Run tests
cargo test

# Run with verbose logging
RUST_LOG=debug cargo run --package swarm-engine-ui -- eval ...

Project Structure

swarm-engine/
├── crates/
│   ├── swarm-engine-core/      # Core runtime
│   │   ├── src/
│   │   │   ├── orchestrator/   # Main loop
│   │   │   ├── agent/          # Worker/Manager definitions
│   │   │   ├── exploration/    # Graph-based exploration
│   │   │   ├── learn/          # Offline learning
│   │   │   └── ...
│   ├── swarm-engine-llm/       # LLM integrations
│   ├── swarm-engine-eval/      # Evaluation framework
│   │   └── scenarios/          # Built-in scenarios
│   └── swarm-engine-ui/        # CLI and GUI

Documentation

Detailed design documentation is available in the RustDoc comments of each crate:

# Generate and open documentation
cargo doc --open --no-deps

Key documentation locations:

swarm-engine-core: Core concepts, tick lifecycle, two-tier memory model
swarm-engine-eval: Evaluation framework, scenario format, metrics
swarm-engine-llm: LLM integrations, batch processing, prompt building
swarm-engine-ui: CLI commands, GUI features

License

MIT License

Commit count: 0

swarm-engine-llm

documentation

README

SwarmEngine

Features

Performance

Architecture

Crates

Quick Start

Prerequisites

Installation

Setting up llama-server with LFM2.5

1. Download the Model

2. Start llama-server

3. Verify Server Status

Why LFM2.5?

Running an Evaluation

CLI Commands

Scenarios

Scenario Variants

Learning System

Architecture

Quick Start

Model Types

Offline Model Parameters

Learning Daemon

LoRA Training (Experimental)

Data Structure

Selection Strategies

LLM Providers

llama-server (Recommended)

Ollama (Alternative)

Configuration

Global Configuration (~/.swarm-engine/config.toml)

Directory Structure

Development

Build and Test

Project Structure

Documentation

License

cargo fmt

Global Configuration (`~/.swarm-engine/config.toml`)