forge-orchestration

Crates.ioforge-orchestration
lib.rsforge-orchestration
version0.4.2
created_at2026-01-17 23:12:52.601275+00
updated_at2026-01-18 01:17:37.513728+00
descriptionRust-native orchestration platform for distributed workloads with MoE routing, autoscaling, and Nomad integration
homepage
repositoryhttps://github.com/WeaveITMeta/Forge
max_upload_size
id2051381
size517,523
Weave (WeaveITMeta)

documentation

https://docs.rs/forge-orchestration

README

Forge Orchestration

Rust-Native Orchestration Platform for Distributed Workloads

Crates.io Documentation License Rust

A high-performance orchestration platform for Rust, designed to manage distributed workloads at hyper-scale. 10-2000x faster scheduling than Kubernetes with intelligent bin-packing for optimal resource utilization.

Performance Benchmarks

Scale K8s Baseline Forge Standard Forge Optimized Forge Batch
100 nodes 500/sec 40,013/sec (80x) 27,642/sec (55x) 1,007,271/sec (2014x)
500 nodes 467/sec 10,991/sec (24x) 17,381/sec (37x) 236,674/sec (506x)
1000 nodes 427/sec 6,097/sec (14x) 13,483/sec (32x) 122,331/sec (287x)
5000 nodes 100/sec 2,453/sec (25x) 8,599/sec (86x) 24,889/sec (249x)

K8s baseline from Kubernetes Scheduling Framework documentation

Key Performance Innovations

  • Lock-free parallel scoring with Rayon for concurrent node evaluation
  • Pre-computed score caches for O(1) node lookups
  • Integer-only scoring - no floating point in hot paths
  • Batch scheduling for amortized overhead (up to 1M decisions/sec)
  • First-Fit Decreasing bin-packing for optimal utilization

Features

Feature Description
High-Performance Scheduler 10-2000x faster than K8s with bin-packing, spread, GPU-locality algorithms
Control Plane Kubernetes-style API server with admission controllers and watch streams
Multi-Region Federation Geo-aware routing, cross-region replication, latency-based failover
MoE Routing Intelligent request routing with learned, GPU-aware, and version-aware strategies
Autoscaling Threshold-based and target-utilization policies with hysteresis
Resilience Circuit breakers, exponential backoff retry, graceful degradation
Game Server SDK UDP/TCP port allocation, session management, spot instance handling
AI/ML Inference Request batching, SSE streaming for LLM tokens

Installation

[dependencies]
forge-orchestration = "0.4.0"
tokio = { version = "1", features = ["full"] }

Quick Start

Control Plane

use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Build the orchestrator
    let forge = ForgeBuilder::new()
        .with_autoscaler(AutoscalerConfig::default()
            .upscale_threshold(0.8)
            .downscale_threshold(0.3))
        .build()?;

    // Define and submit a job
    let job = Job::new("my-service")
        .with_group("api", Task::new("server")
            .driver(Driver::Exec)
            .command("/usr/bin/server")
            .args(vec!["--port", "8080"])
            .resources(500, 256));

    forge.submit_job(job).await?;

    // Run the control plane
    forge.run().await?;
    Ok(())
}

Workload SDK

The SDK is included in the main crate under forge_orchestration::sdk:

use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Signal readiness to orchestrator
    ready()?;

    // Allocate a port dynamically
    let port = allocate_port(8000..9000)?;
    println!("Listening on port {}", port);

    // Install graceful shutdown handlers
    graceful_shutdown();

    // ... your server logic ...

    // Wait for shutdown signal
    shutdown_signal().await;
    Ok(())
}

Architecture

[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
              |
              v
[Forge Control Plane]
  - Tokio Runtime (async loops)
  - Rayon (parallel alloc)
  - Raft (consensus)
  - State: RocksDB (local) + etcd (distributed)
  - MoE Router (gating to experts)
  |
  v
[Nomad Scheduler] (jobs: containers/binaries)
  |
  v
[Workers/Nodes]
  - QUIC/TLS Networking
  - Prometheus Metrics

API Reference

Modules

Module Description
job Job, Task, TaskGroup, Driver definitions
moe MoERouter trait, DefaultMoERouter, LoadAwareMoERouter, RoundRobinMoERouter
autoscaler Autoscaler, AutoscalerConfig, ScalingPolicy trait
nomad NomadClient for HashiCorp Nomad API
storage StateStore trait, MemoryStore, FileStore
networking HttpServer, QuicTransport
metrics ForgeMetrics, MetricsExporter, MetricsHook trait
sdk Workload SDK: ready(), allocate_port(), graceful_shutdown(), ForgeClient

MoE Routing

Built-in routers:

  • DefaultMoERouter: Hash-based consistent routing
  • LoadAwareMoERouter: Routes to least-loaded expert with affinity
  • RoundRobinMoERouter: Sequential distribution

Custom router:

use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;

struct MyRouter;

#[async_trait]
impl MoERouter for MyRouter {
    async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
        RouteResult::new(input.len() % num_experts)
    }
    fn name(&self) -> &str { "my-router" }
}

Autoscaling

use forge_orchestration::AutoscalerConfig;

let config = AutoscalerConfig::default()
    .upscale_threshold(0.8)
    .downscale_threshold(0.3)
    .hysteresis_secs(300)
    .bounds(1, 100);

Storage

use forge_orchestration::storage::{MemoryStore, FileStore};

let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;

Metrics

use forge_orchestration::ForgeMetrics;

let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;

SDK Functions

Function Description
sdk::ready() Signal readiness to orchestrator
sdk::allocate_port(range) Allocate an available port from range
sdk::release_port(port) Release an allocated port
sdk::graceful_shutdown() Install SIGTERM/SIGINT handlers
sdk::shutdown_signal() Async wait for shutdown signal
sdk::ForgeClient HTTP client for Forge API

Environment Variables

Variable Description
FORGE_API Forge API endpoint for SDK
FORGE_ALLOC_ID Allocation ID (set by orchestrator)
FORGE_TASK_NAME Task name (set by orchestrator)

Builder Configuration

use forge_orchestration::ForgeBuilder;

ForgeBuilder::new()
    .with_nomad_api("http://localhost:4646")
    .with_nomad_token("secret-token")
    .with_store_path("/var/lib/forge/state.json")
    .with_node_name("forge-1")
    .with_datacenter("dc1")
    .with_autoscaler(AutoscalerConfig::default())
    .with_metrics(true)
    .build()?

License

Apache 2.0

Commit count: 1

cargo fmt