forge-orchestration

Crates.io	forge-orchestration
lib.rs	forge-orchestration
version	0.4.2
created_at	2026-01-17 23:12:52.601275+00
updated_at	2026-01-18 01:17:37.513728+00
description	Rust-native orchestration platform for distributed workloads with MoE routing, autoscaling, and Nomad integration
homepage
repository	https://github.com/WeaveITMeta/Forge
max_upload_size
id	2051381
size	517,523

Weave (WeaveITMeta)

documentation

https://docs.rs/forge-orchestration

README

Forge Orchestration

Rust-Native Orchestration Platform for Distributed Workloads

A high-performance orchestration platform for Rust, designed to manage distributed workloads at hyper-scale. 10-2000x faster scheduling than Kubernetes with intelligent bin-packing for optimal resource utilization.

Performance Benchmarks

Scale	K8s Baseline	Forge Standard	Forge Optimized	Forge Batch
100 nodes	500/sec	40,013/sec (80x)	27,642/sec (55x)	1,007,271/sec (2014x)
500 nodes	467/sec	10,991/sec (24x)	17,381/sec (37x)	236,674/sec (506x)
1000 nodes	427/sec	6,097/sec (14x)	13,483/sec (32x)	122,331/sec (287x)
5000 nodes	100/sec	2,453/sec (25x)	8,599/sec (86x)	24,889/sec (249x)

K8s baseline from Kubernetes Scheduling Framework documentation

Key Performance Innovations

Lock-free parallel scoring with Rayon for concurrent node evaluation
Pre-computed score caches for O(1) node lookups
Integer-only scoring - no floating point in hot paths
Batch scheduling for amortized overhead (up to 1M decisions/sec)
First-Fit Decreasing bin-packing for optimal utilization

Features

Feature	Description
High-Performance Scheduler	10-2000x faster than K8s with bin-packing, spread, GPU-locality algorithms
Control Plane	Kubernetes-style API server with admission controllers and watch streams
Multi-Region Federation	Geo-aware routing, cross-region replication, latency-based failover
MoE Routing	Intelligent request routing with learned, GPU-aware, and version-aware strategies
Autoscaling	Threshold-based and target-utilization policies with hysteresis
Resilience	Circuit breakers, exponential backoff retry, graceful degradation
Game Server SDK	UDP/TCP port allocation, session management, spot instance handling
AI/ML Inference	Request batching, SSE streaming for LLM tokens

Installation

[dependencies]
forge-orchestration = "0.4.0"
tokio = { version = "1", features = ["full"] }

Quick Start

Control Plane

use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Build the orchestrator
    let forge = ForgeBuilder::new()
        .with_autoscaler(AutoscalerConfig::default()
            .upscale_threshold(0.8)
            .downscale_threshold(0.3))
        .build()?;

    // Define and submit a job
    let job = Job::new("my-service")
        .with_group("api", Task::new("server")
            .driver(Driver::Exec)
            .command("/usr/bin/server")
            .args(vec!["--port", "8080"])
            .resources(500, 256));

    forge.submit_job(job).await?;

    // Run the control plane
    forge.run().await?;
    Ok(())
}

Workload SDK

The SDK is included in the main crate under forge_orchestration::sdk:

use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Signal readiness to orchestrator
    ready()?;

    // Allocate a port dynamically
    let port = allocate_port(8000..9000)?;
    println!("Listening on port {}", port);

    // Install graceful shutdown handlers
    graceful_shutdown();

    // ... your server logic ...

    // Wait for shutdown signal
    shutdown_signal().await;
    Ok(())
}

Architecture

[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
              |
              v
[Forge Control Plane]
  - Tokio Runtime (async loops)
  - Rayon (parallel alloc)
  - Raft (consensus)
  - State: RocksDB (local) + etcd (distributed)
  - MoE Router (gating to experts)
  |
  v
[Nomad Scheduler] (jobs: containers/binaries)
  |
  v
[Workers/Nodes]
  - QUIC/TLS Networking
  - Prometheus Metrics

API Reference

Modules

Module	Description
`job`	`Job`, `Task`, `TaskGroup`, `Driver` definitions
`moe`	`MoERouter` trait, `DefaultMoERouter`, `LoadAwareMoERouter`, `RoundRobinMoERouter`
`autoscaler`	`Autoscaler`, `AutoscalerConfig`, `ScalingPolicy` trait
`nomad`	`NomadClient` for HashiCorp Nomad API
`storage`	`StateStore` trait, `MemoryStore`, `FileStore`
`networking`	`HttpServer`, `QuicTransport`
`metrics`	`ForgeMetrics`, `MetricsExporter`, `MetricsHook` trait
`sdk`	Workload SDK: `ready()`, `allocate_port()`, `graceful_shutdown()`, `ForgeClient`

MoE Routing

Built-in routers:

DefaultMoERouter: Hash-based consistent routing
LoadAwareMoERouter: Routes to least-loaded expert with affinity
RoundRobinMoERouter: Sequential distribution

Custom router:

use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;

struct MyRouter;

#[async_trait]
impl MoERouter for MyRouter {
    async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
        RouteResult::new(input.len() % num_experts)
    }
    fn name(&self) -> &str { "my-router" }
}

Autoscaling

use forge_orchestration::AutoscalerConfig;

let config = AutoscalerConfig::default()
    .upscale_threshold(0.8)
    .downscale_threshold(0.3)
    .hysteresis_secs(300)
    .bounds(1, 100);

Storage

use forge_orchestration::storage::{MemoryStore, FileStore};

let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;

Metrics

use forge_orchestration::ForgeMetrics;

let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;

SDK Functions

Function	Description
`sdk::ready()`	Signal readiness to orchestrator
`sdk::allocate_port(range)`	Allocate an available port from range
`sdk::release_port(port)`	Release an allocated port
`sdk::graceful_shutdown()`	Install SIGTERM/SIGINT handlers
`sdk::shutdown_signal()`	Async wait for shutdown signal
`sdk::ForgeClient`	HTTP client for Forge API

Environment Variables

Variable	Description
`FORGE_API`	Forge API endpoint for SDK
`FORGE_ALLOC_ID`	Allocation ID (set by orchestrator)
`FORGE_TASK_NAME`	Task name (set by orchestrator)

Builder Configuration

use forge_orchestration::ForgeBuilder;

ForgeBuilder::new()
    .with_nomad_api("http://localhost:4646")
    .with_nomad_token("secret-token")
    .with_store_path("/var/lib/forge/state.json")
    .with_node_name("forge-1")
    .with_datacenter("dc1")
    .with_autoscaler(AutoscalerConfig::default())
    .with_metrics(true)
    .build()?

License

Apache 2.0

Commit count: 1