systemd_reload

Crates.iosystemd_reload
lib.rssystemd_reload
version0.1.12
created_at2025-12-12 07:51:00.051189+00
updated_at2025-12-12 08:51:14.399737+00
descriptionZero-downtime process management for Rust applications / Rust应用零停机进程管理
homepagehttps://github.com/js0/rust/tree/main/systemd_reload
repositoryhttps://github.com/js0/rust.git
max_upload_size
id1981147
size68,193
i18n.site (i18nsite)

documentation

README

English | 中文


systemd_reload: Zero-downtime Process Management for Rust Applications

A high-performance Rust library that provides systemd-style process management with zero-downtime rolling updates, graceful shutdown, and configurable timeout handling.

Table of Contents

Features

  • Zero-downtime Rolling Updates: Seamlessly replace running processes without service interruption
  • Graceful Shutdown: Proper signal handling with configurable timeout for clean process termination
  • Configurable Timeout: Set custom timeout duration for graceful worker shutdown
  • High Performance: Built with crossfire async message queues for minimal overhead
  • Signal Management: Handle SIGHUP for reload, SIGTERM/SIGINT for shutdown
  • Worker Mode Detection: Prevent infinite recursion during process spawning
  • Automatic Process Monitoring: Track and manage worker processes lifecycle

Quick Start

Add to your Cargo.toml:

[dependencies]
systemd_reload = "0.1"

Basic usage:

use systemd_reload::{run, is_worker};

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker process: run your main application logic
        run_application().await;
    } else {
        // Supervisor process: manage workers with 600 seconds timeout
        run(600).await;
    }
}

async fn run_application() {
    // Your actual application code goes here
    println!("Worker process started, PID: {}", std::process::id());
    
    // Example: web server, background task, etc.
    loop {
        // Your business logic
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

API Reference

Core Types

Run

The main process manager that handles worker lifecycle.

impl Run {
    pub fn new(timeout: u64) -> Self
    pub async fn run(mut self)
}

Parameters:

  • timeout: Maximum seconds to wait for graceful worker shutdown before force killing

Exit

Represents process exit information.

pub struct Exit {
    pub pid: u32,
    pub code: Option<i32>,
}

Functions

run(timeout: u64) -> impl Future<Output = ()>

Convenience function to create and run the process manager in one call.

Parameters:

  • timeout: Maximum seconds to wait for graceful worker shutdown

Example:

// Wait up to 600 seconds for graceful shutdown
run(600).await;

is_worker() -> bool

Detects if the current process is running in worker mode by checking the SYSTEMD_RELOAD_WORKER environment variable.

Architecture

The library implements a supervisor-worker pattern with async message passing:

graph TD
    A[Supervisor Process] --> B[Signal Handler]
    A --> C[Worker Manager]
    A --> D[Message Queue]
    
    B --> E[SIGHUP: Rolling Update]
    B --> F[SIGTERM/SIGINT: Shutdown]
    
    C --> G[Spawn Worker]
    C --> H[Monitor Worker]
    C --> I[Handle Exit]
    
    G --> J[Worker Process 1]
    G --> K[Worker Process 2]
    
    H --> D
    I --> D
    
    D --> L[Process Events]
    L --> M[Supervisor Exits]
    L --> N[Clean Shutdown]

Process Flow

  1. Initialization: Supervisor starts and spawns initial worker process
  2. Signal Monitoring: Listen for system signals and worker exit events
  3. Rolling Update: On SIGHUP, spawn new worker and gracefully terminate old one
  4. Worker Exit Handling: When a worker exits, remove it from tracking; supervisor exits if all workers are dead
  5. Graceful Shutdown: On SIGTERM/SIGINT, send SIGTERM to all workers, wait for graceful exit (up to timeout), then force kill with SIGKILL if needed

Technology Stack

  • Async Runtime: Tokio for async/await support
  • Message Queue: Crossfire for high-performance inter-process communication
  • Signal Handling: signal-hook-tokio for async signal processing
  • Process Management: Tokio process spawning and monitoring
  • Logging: Standard Rust log crate integration

Project Structure

systemd_reload/
├── src/
│   ├── lib.rs          # Public API and exports
│   └── run.rs          # Core Run struct implementation
├── tests/
│   └── main.rs         # Integration tests
├── readme/
│   ├── en.md          # English documentation
│   └── zh.md          # Chinese documentation
└── Cargo.toml         # Project configuration

Examples

Web Server Example

use systemd_reload::{run, is_worker};
use log::info;

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker process: run your web server
        start_web_server().await;
    } else {
        // Supervisor process: manage workers with 600 seconds timeout
        info!("Starting supervisor");
        run(600).await;
    }
}

async fn start_web_server() {
    info!("Web server started, PID: {}", std::process::id());
    
    // Example web server setup
    // let app = axum::Router::new().route("/", axum::routing::get(handler));
    // axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
    //     .serve(app.into_make_service())
    //     .await
    //     .unwrap();
    
    // Placeholder for demonstration
    loop {
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

Testing Worker Detection

use systemd_reload::is_worker;
use std::env;

fn main() {
    // Initially not a worker
    assert!(!is_worker());
    
    // Set environment variable
    unsafe {
        env::set_var("SYSTEMD_RELOAD_WORKER", "true");
    }
    assert!(is_worker());
    
    // Clean up
    unsafe {
        env::remove_var("SYSTEMD_RELOAD_WORKER");
    }
    assert!(!is_worker());
}

Historical Context

The concept of zero-downtime deployments has roots in telecommunications and mission-critical systems from the 1960s. The term "hot swapping" originated from the ability to replace hardware components without shutting down the system.

In software, this pattern gained prominence with:

  • 1970s: IBM mainframes introduced dynamic program replacement
  • 1990s: Erlang/OTP pioneered hot code swapping for telecom systems
  • 2000s: Web servers like nginx popularized graceful reloads
  • 2010s: Container orchestration (Kubernetes) made rolling updates mainstream

systemd, released in 2010, brought this pattern to Linux system services with socket activation and service reloading. This library brings similar capabilities to Rust applications, leveraging modern async programming patterns for high-performance process management.

The supervisor-worker pattern used here is inspired by Erlang's "let it crash" philosophy, where supervisors monitor and restart failed workers, ensuring system resilience through isolation and recovery rather than prevention.


About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:


systemd_reload: Rust应用零停机进程管理

高性能Rust库,提供systemd风格的进程管理,支持零停机滚动更新、优雅关闭和可配置的超时处理。

目录

功能特性

  • 零停机滚动更新: 无缝替换运行中的进程,不中断服务
  • 优雅关闭: 正确处理信号,可配置超时时间以确保进程干净退出
  • 可配置超时: 为优雅关闭worker设置自定义超时时长
  • 高性能: 使用crossfire异步消息队列,开销极小
  • 信号管理: 处理SIGHUP重载、SIGTERM/SIGINT关闭
  • Worker模式检测: 防止进程生成时的无限递归
  • 自动进程监控: 跟踪和管理worker进程生命周期

快速开始

添加到 Cargo.toml:

[dependencies]
systemd_reload = "0.1"

基本用法:

use systemd_reload::{run, is_worker};

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker进程:运行主要应用逻辑
        run_application().await;
    } else {
        // 监督者进程:管理worker,设置600秒超时
        run(600).await;
    }
}

async fn run_application() {
    // 实际的应用代码放在这里
    println!("Worker进程启动,PID: {}", std::process::id());
    
    // 示例:Web服务器、后台任务等
    loop {
        // 业务逻辑
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

API参考

核心类型

Run

主进程管理器,处理worker生命周期。

impl Run {
    pub fn new(timeout: u64) -> Self
    pub async fn run(mut self)
}

参数:

  • timeout: 强制终止worker前等待优雅关闭的最大秒数

Exit

表示进程退出信息。

pub struct Exit {
    pub pid: u32,
    pub code: Option<i32>,
}

函数

run(timeout: u64) -> impl Future<Output = ()>

便捷函数,一次调用创建并运行进程管理器。

参数:

  • timeout: 优雅关闭等待的最大秒数

示例:

// 等待最多600秒进行优雅关闭
run(600).await;

is_worker() -> bool

通过检查 SYSTEMD_RELOAD_WORKER 环境变量来检测当前进程是否运行在worker模式。

架构设计

库实现了监督者-工作者模式,使用异步消息传递:

graph TD
    A[监督者进程] --> B[信号处理器]
    A --> C[Worker管理器]
    A --> D[消息队列]
    
    B --> E[SIGHUP: 滚动更新]
    B --> F[SIGTERM/SIGINT: 关闭]
    
    C --> G[生成Worker]
    C --> H[监控Worker]
    C --> I[处理退出]
    
    G --> J[Worker进程1]
    G --> K[Worker进程2]
    
    H --> D
    I --> D
    
    D --> L[进程事件]
    L --> M[监督者退出]
    L --> N[优雅关闭]

处理流程

  1. 初始化: 监督者启动并生成初始worker进程
  2. 信号监控: 监听系统信号和worker退出事件
  3. 滚动更新: 收到SIGHUP时,生成新worker并优雅终止旧worker
  4. Worker退出处理: 当worker退出时,从跟踪列表中移除;如果所有worker都死亡,监督者退出
  5. 优雅关闭: 收到SIGTERM/SIGINT时,向所有worker发送SIGTERM,等待优雅退出(最多等待timeout时长),如需要则使用SIGKILL强制终止

技术栈

  • 异步运行时: Tokio提供async/await支持
  • 消息队列: Crossfire提供高性能进程间通信
  • 信号处理: signal-hook-tokio提供异步信号处理
  • 进程管理: Tokio进程生成和监控
  • 日志记录: 集成标准Rust log crate

项目结构

systemd_reload/
├── src/
│   ├── lib.rs          # 公共API和导出
│   └── run.rs          # 核心Run结构体实现
├── tests/
│   └── main.rs         # 集成测试
├── readme/
│   ├── en.md          # 英文文档
│   └── zh.md          # 中文文档
└── Cargo.toml         # 项目配置

使用示例

Web服务器示例

use systemd_reload::{run, is_worker};
use log::info;

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker进程:运行Web服务器
        start_web_server().await;
    } else {
        // 监督者进程:管理worker,设置600秒超时
        info!("启动监督者");
        run(600).await;
    }
}

async fn start_web_server() {
    info!("Web服务器启动,PID: {}", std::process::id());
    
    // Web服务器设置示例
    // let app = axum::Router::new().route("/", axum::routing::get(handler));
    // axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
    //     .serve(app.into_make_service())
    //     .await
    //     .unwrap();
    
    // 演示用占位符
    loop {
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

测试Worker检测

use systemd_reload::is_worker;
use std::env;

fn main() {
    // 初始状态不是worker
    assert!(!is_worker());
    
    // 设置环境变量
    unsafe {
        env::set_var("SYSTEMD_RELOAD_WORKER", "true");
    }
    assert!(is_worker());
    
    // 清理
    unsafe {
        env::remove_var("SYSTEMD_RELOAD_WORKER");
    }
    assert!(!is_worker());
}

历史背景

零停机部署的概念起源于1960年代的电信和关键任务系统。"热插拔"一词最初指在不关闭系统的情况下更换硬件组件的能力。

在软件领域,这种模式的发展历程:

  • 1970年代: IBM大型机引入动态程序替换
  • 1990年代: Erlang/OTP为电信系统开创了热代码交换
  • 2000年代: nginx等Web服务器普及了优雅重载
  • 2010年代: 容器编排(Kubernetes)使滚动更新成为主流

systemd于2010年发布,通过socket激活和服务重载将这种模式带到Linux系统服务中。本库将类似功能引入Rust应用,利用现代异步编程模式实现高性能进程管理。

这里使用的监督者-工作者模式受到Erlang"让它崩溃"哲学的启发,监督者监控并重启失败的工作者,通过隔离和恢复而非预防来确保系统弹性。


关于

本项目为 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

Commit count: 0

cargo fmt