ninelives

Crates.ioninelives
lib.rsninelives
version0.2.0
created_at2025-11-25 14:29:27.767504+00
updated_at2025-11-25 19:34:38.239324+00
descriptionResilience primitives for async Rust: retry, circuit breaker, bulkhead, timeout, and composable stacks.
homepage
repositoryhttps://github.com/flyingrobots/ninelives
max_upload_size
id1949872
size283,877
James Ross (flyingrobots)

documentation

README

Nine Lives 🐱

Tower-native fractal supervision for async Rust — autonomous, self-healing Services via composable policy algebra.

ninelives

Resilience patterns for Rust with algebraic composition.

Crates.io Documentation License

Nine Lives provides battle-tested resilience patterns (retry, circuit breaker, bulkhead, timeout) as composable tower layers with a unique algebraic composition system.

Features

  • 🔁 Retry policies with exponential/linear/constant backoff and jitter
  • Circuit breakers with half-open state recovery
  • 🚧 Bulkheads for concurrency limiting and resource isolation
  • ⏱️ Timeout policies integrated with tokio
  • 🧮 Algebraic composition via intuitive operators (+, |, &)
  • 🏎️ Fork-join for concurrent racing (Happy Eyeballs pattern)
  • 🔒 Lock-free implementations using atomics
  • 🏗️ Tower-native - works with any tower Service
  • 🌐 Companion sinks (OTLP, NATS, Kafka, Elastic, etcd, Prometheus, JSONL) via optional crates

Quick Start

Add to your Cargo.toml:

[dependencies]
ninelives = "0.2"
tower = "0.5"
tokio = { version = "1", features = ["full"] }

Basic Usage

use ninelives::prelude::*;
use std::time::Duration;
use tower::{Service, ServiceBuilder, ServiceExt};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Apply a timeout to any service
    let mut svc = ServiceBuilder::new()
        .layer(TimeoutLayer::new(Duration::from_secs(1))?)
        .service_fn(|req: &str| async move {
            Ok::<_, std::io::Error>(format!("Response: {}", req))
        });

    let response = svc.ready().await?.call("hello").await?;
    println!("{}", response);
    Ok(())
}

Algebraic Composition - The Nine Lives Advantage

Compose resilience strategies using intuitive operators:

  • Policy(A) + Policy(B) - Sequential composition: A wraps B
  • Policy(A) | Policy(B) - Fallback: try A, fall back to B on error
  • Policy(A) & Policy(B) - Fork-join: try both concurrently, return first success

Precedence: & > + > | (like * > + > bitwise-or in math)

Example: Fallback Strategy

Try an aggressive timeout first, fall back to a longer timeout on failure:

use ninelives::prelude::*;
use std::time::Duration;
use tower::{ServiceBuilder, Layer};

let fast = Policy(TimeoutLayer::new(Duration::from_millis(100))?);
let slow = Policy(TimeoutLayer::new(Duration::from_secs(5))?);
let policy = fast | slow;

let svc = ServiceBuilder::new()
    .layer(policy)
    .service_fn(|req| async { Ok::<_, std::io::Error>(req) });

Example: Fork-Join (Happy Eyeballs)

Race two strategies concurrently and return the first success:

use ninelives::prelude::*;
use std::time::Duration;

// Create two timeout policies with different durations
let ipv4 = Policy(TimeoutLayer::new(Duration::from_millis(100))?);
let ipv6 = Policy(TimeoutLayer::new(Duration::from_millis(150))?);

// Race them concurrently - first success wins
let policy = ipv4 & ipv6;

let svc = ServiceBuilder::new()
    .layer(policy)
    .service_fn(|req| async { Ok::<_, std::io::Error>(req) });

Example: Multi-Tier Resilience

Combine multiple strategies with automatic precedence:

use ninelives::prelude::*;
use std::time::Duration;

// Aggressive: just a fast timeout
let aggressive = Policy(TimeoutLayer::new(Duration::from_millis(50))?);

// Defensive: nested timeouts for retries
let defensive = Policy(TimeoutLayer::new(Duration::from_secs(10))?)
              + Policy(TimeoutLayer::new(Duration::from_secs(5))?);

// Try aggressive first, fall back to defensive
let policy = aggressive | defensive;
// Parsed as: Policy(Timeout50ms) | (Policy(Timeout10s) + Policy(Timeout5s))

Example: Circuit Breaker with Retry

use ninelives::prelude::*;
use std::time::Duration;

// Build a retry policy with exponential backoff
let retry = RetryPolicy::builder()
    .max_attempts(3)
    .backoff(Backoff::exponential(Duration::from_millis(100)))
    .with_jitter(Jitter::full())
    .build()?;

// Configure circuit breaker
let circuit_breaker = CircuitBreakerLayer::new(
    CircuitBreakerConfig::default()
        .failure_threshold(5)
        .timeout_duration(Duration::from_secs(10))
)?;

// Compose: circuit breaker wraps retry
let policy = Policy(circuit_breaker) + Policy(retry.into_layer());

Telemetry Sink Ladder

  • Baby mode: MemorySink::with_capacity(1_000) for local inspection.
  • Intermediate: NonBlockingSink(LogSink) to keep request paths non-blocking while logging.
  • Advanced: NonBlockingSink(OtlpSink) + StreamingSink fan-out for in-cluster consumers.
  • GOD MODE: StreamingSink → NATS/Kafka/Elastic via companion crates, with Observer + Sentinel auto-tuning when drop/evict metrics spike.

See recipes in src/cookbook.rs and companion cookbooks:

  • ninelives-otlp/README.md
  • ninelives-nats/README.md
  • ninelives-kafka/README.md
  • ninelives-elastic/README.md
  • ninelives-etcd/README.md
  • ninelives-prometheus/README.md
  • ninelives-jsonl/README.md

Cookbook (pick your recipe)

  • Simple retry: retry_fast — 3 attempts, 50ms exp backoff + jitter.
  • Latency guard: timeout_p95 — 300ms budget.
  • Bulkhead: bulkhead_isolate(max) — protect shared deps.
  • API guardrail (intermediate): api_guardrail — timeout + breaker + bulkhead.
  • Reliable read (advanced): reliable_read — fast path then fallback stack.
  • Hedged read (tricky): hedged_read — fork-join two differently-tuned stacks.
  • Hedge + fallback (god tier): hedged_then_fallback — race two fast paths, then fall back to a sturdy stack.
  • Sensible defaults: sensible_defaults — timeout + retry + bulkhead starter pack.

All live in src/cookbook.rs. Moved to the ninelives-cookbook crate (see its README/examples).

Tower Integration

Nine Lives layers work seamlessly with tower's ServiceBuilder:

use ninelives::prelude::*;
use tower::ServiceBuilder;
use std::time::Duration;

let service = ServiceBuilder::new()
    .layer(TimeoutLayer::new(Duration::from_secs(30))?)
    .layer(CircuitBreakerLayer::new(CircuitBreakerConfig::default())?)
    .layer(BulkheadLayer::new(10)?)
    .service(my_inner_service);

Or use the algebraic syntax:

let policy = Policy(TimeoutLayer::new(Duration::from_secs(30))?)
           + Policy(CircuitBreakerLayer::new(CircuitBreakerConfig::default())?)
           + Policy(BulkheadLayer::new(10)?);

let service = ServiceBuilder::new()
    .layer(policy)
    .service(my_inner_service);

Available Layers

TimeoutLayer

Enforces time limits on operations:

use ninelives::prelude::*;
use std::time::Duration;

let timeout = TimeoutLayer::new(Duration::from_secs(5))?;

RetryLayer

Retries failed operations with configurable backoff and jitter:

use ninelives::prelude::*;
use std::time::Duration;

let retry = RetryPolicy::builder()
    .max_attempts(3)
    .backoff(Backoff::exponential(Duration::from_millis(100)))
    .with_jitter(Jitter::full())
    .build()?
    .into_layer();

Backoff strategies:

  • Backoff::constant(duration) - Fixed delay
  • Backoff::linear(base) - Linear increase: base * attempt
  • Backoff::exponential(base) - Exponential: base * 2^attempt

Jitter strategies:

  • Jitter::none() - No jitter
  • Jitter::full() - Random [0, delay]
  • Jitter::equal() - delay/2 + random [0, delay/2]
  • Jitter::decorrelated() - AWS-style stateful jitter

CircuitBreakerLayer

Prevents cascading failures with three-state management (Closed/Open/HalfOpen):

use ninelives::prelude::*;
use std::time::Duration;

let circuit_breaker = CircuitBreakerLayer::new(
    CircuitBreakerConfig::default()
        .failure_threshold(5)        // Open after 5 failures
        .timeout_duration(Duration::from_secs(10))  // Stay open for 10s
        .half_open_max_calls(3)      // Allow 3 test calls in half-open
)?;

BulkheadLayer

Limits concurrent requests for resource isolation:

use ninelives::prelude::*;

let bulkhead = BulkheadLayer::new(10)?;  // Max 10 concurrent requests

Error Handling

All resilience errors are unified under ResilienceError<E>:

use ninelives::ResilienceError;

match service.call(request).await {
    Ok(response) => { /* success */ },
    Err(ResilienceError::Timeout { .. }) => { /* timeout */ },
    Err(ResilienceError::CircuitOpen { .. }) => { /* circuit breaker open */ },
    Err(ResilienceError::RetryExhausted { failures, .. }) => {
        // All retry attempts failed
        eprintln!("Failed after {} attempts", failures.len());
    },
    Err(ResilienceError::Bulkhead { .. }) => { /* capacity exhausted */ },
    Err(ResilienceError::Inner(e)) => { /* inner service error */ },
}

Operator Precedence

When combining operators, understand the precedence rules:

// & binds tighter than +, and + binds tighter than |
A | B + C & D   // Parsed as: A | (B + (C & D))

// Use parentheses for explicit control
(A | B) + C     // C wraps the fallback between A and B

Examples:

// Try fast, fallback to slow with retry
let policy = fast | retry + slow;
// Equivalent to: fast | (retry + slow)

// Retry wraps a fallback
let policy = retry + (fast | slow);

// Happy Eyeballs: race IPv4 and IPv6
let policy = ipv4 & ipv6;
// Both called concurrently, first success wins

// Complex composition
let policy = aggressive | defensive + (ipv4 & ipv6);
// Try aggressive, fallback to defensive wrapping parallel attempts

Testability

Nine Lives is designed for testing with dependency injection:

use ninelives::prelude::*;
use std::time::Duration;

// Use InstantSleeper for tests (no actual delays)
let retry = RetryPolicy::builder()
    .max_attempts(3)
    .backoff(Backoff::exponential(Duration::from_millis(100)))
    .with_sleeper(InstantSleeper)
    .build()?;

// TrackingSleeper records sleep durations for assertions
let tracker = TrackingSleeper::new();
let retry = RetryPolicy::builder()
    .max_attempts(3)
    .with_sleeper(tracker.clone())
    .build()?;

// ... exercise retry ...

let sleeps = tracker.get_sleeps();
assert_eq!(sleeps.len(), 2); // Slept twice before success

Roadmap (snapshot)

Nine Lives is marching toward autonomous, fractal resilience. Current focus:

  • ✅ Phase 0–1: Tower-native algebra + telemetry sinks (done)
  • 🚧 Phase 2: Control plane & adaptive configs (in progress)
  • 🧭 Phase 3: Observer for aggregated state (planned)
  • 🔮 Phase 5: Sentinel meta-policies + shadow eval (planned)

Full detail and milestones live in ROADMAP.md.

Performance

Nine Lives is built for production:

  • Lock-free circuit breaker state transitions using atomics
  • Zero-allocation backoff/jitter calculations with overflow protection
  • Minimal overhead - resilience layers add < 1% latency in common cases

Benchmarks coming soon.

Comparison to Other Libraries

Feature Nine Lives Resilience4j (Java) Polly (C#) tower
Uniform Service Abstraction
Algebraic Composition (+, |, &)
Fork-Join (Happy Eyeballs)
Tower Integration ✅ Native N/A N/A ✅ Native
Lock-Free Implementations Partial Partial Varies
Retry with Backoff/Jitter
Circuit Breaker
Bulkhead
Timeout

Nine Lives' unique advantage: Algebraic composition with fork-join support lets you express complex resilience strategies declaratively, including concurrent racing patterns like Happy Eyeballs, without nested builders or imperative code.

Examples

See the ninelives-cookbook/examples directory for runnable examples:

  • retry_only.rs - Focused retry with backoff, jitter, and should_retry
  • bulkhead_concurrency.rs - Non-blocking bulkhead behavior under contention
  • timeout_fallback.rs - Timeout with fallback policy
  • decorrelated_jitter.rs - AWS-style decorrelated jitter
  • algebra_composition.rs - Algebraic composition patterns
  • telemetry_basic.rs / telemetry_composition.rs - Attaching sinks and composing telemetry

Run with:

cargo run -p ninelives-cookbook --example timeout_fallback

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

License

Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0)

@ 2025 • James Ross • 📧🔗 FLYING•ROBOTS

Commit count: 0

cargo fmt