systemd_reload

Crates.io	systemd_reload
lib.rs	systemd_reload
version	0.1.12
created_at	2025-12-12 07:51:00.051189+00
updated_at	2025-12-12 08:51:14.399737+00
description	Zero-downtime process management for Rust applications / Rust应用零停机进程管理
homepage	https://github.com/js0/rust/tree/main/systemd_reload
repository	https://github.com/js0/rust.git
max_upload_size
id	1981147
size	68,193

i18n.site (i18nsite)

documentation

README

English | 中文

systemd_reload: Zero-downtime Process Management for Rust Applications

A high-performance Rust library that provides systemd-style process management with zero-downtime rolling updates, graceful shutdown, and configurable timeout handling.

Features
Quick Start
API Reference
Architecture
Technology Stack
Project Structure
Examples
Historical Context

Features

Zero-downtime Rolling Updates: Seamlessly replace running processes without service interruption
Graceful Shutdown: Proper signal handling with configurable timeout for clean process termination
Configurable Timeout: Set custom timeout duration for graceful worker shutdown
High Performance: Built with crossfire async message queues for minimal overhead
Signal Management: Handle SIGHUP for reload, SIGTERM/SIGINT for shutdown
Worker Mode Detection: Prevent infinite recursion during process spawning
Automatic Process Monitoring: Track and manage worker processes lifecycle

Quick Start

Add to your Cargo.toml:

[dependencies]
systemd_reload = "0.1"

Basic usage:

use systemd_reload::{run, is_worker};

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker process: run your main application logic
        run_application().await;
    } else {
        // Supervisor process: manage workers with 600 seconds timeout
        run(600).await;
    }
}

async fn run_application() {
    // Your actual application code goes here
    println!("Worker process started, PID: {}", std::process::id());
    
    // Example: web server, background task, etc.
    loop {
        // Your business logic
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

API Reference

Core Types

`Run`

The main process manager that handles worker lifecycle.

impl Run {
    pub fn new(timeout: u64) -> Self
    pub async fn run(mut self)
}

Parameters:

timeout: Maximum seconds to wait for graceful worker shutdown before force killing

`Exit`

Represents process exit information.

pub struct Exit {
    pub pid: u32,
    pub code: Option<i32>,
}

Functions

`run(timeout: u64) -> impl Future<Output = ()>`

Convenience function to create and run the process manager in one call.

Parameters:

timeout: Maximum seconds to wait for graceful worker shutdown

Example:

// Wait up to 600 seconds for graceful shutdown
run(600).await;

`is_worker() -> bool`

Detects if the current process is running in worker mode by checking the SYSTEMD_RELOAD_WORKER environment variable.

Architecture

The library implements a supervisor-worker pattern with async message passing:

graph TD
    A[Supervisor Process] --> B[Signal Handler]
    A --> C[Worker Manager]
    A --> D[Message Queue]
    
    B --> E[SIGHUP: Rolling Update]
    B --> F[SIGTERM/SIGINT: Shutdown]
    
    C --> G[Spawn Worker]
    C --> H[Monitor Worker]
    C --> I[Handle Exit]
    
    G --> J[Worker Process 1]
    G --> K[Worker Process 2]
    
    H --> D
    I --> D
    
    D --> L[Process Events]
    L --> M[Supervisor Exits]
    L --> N[Clean Shutdown]

Process Flow

Initialization: Supervisor starts and spawns initial worker process
Signal Monitoring: Listen for system signals and worker exit events
Rolling Update: On SIGHUP, spawn new worker and gracefully terminate old one
Worker Exit Handling: When a worker exits, remove it from tracking; supervisor exits if all workers are dead
Graceful Shutdown: On SIGTERM/SIGINT, send SIGTERM to all workers, wait for graceful exit (up to timeout), then force kill with SIGKILL if needed

Technology Stack

Async Runtime: Tokio for async/await support
Message Queue: Crossfire for high-performance inter-process communication
Signal Handling: signal-hook-tokio for async signal processing
Process Management: Tokio process spawning and monitoring
Logging: Standard Rust log crate integration

Project Structure

systemd_reload/
├── src/
│   ├── lib.rs          # Public API and exports
│   └── run.rs          # Core Run struct implementation
├── tests/
│   └── main.rs         # Integration tests
├── readme/
│   ├── en.md          # English documentation
│   └── zh.md          # Chinese documentation
└── Cargo.toml         # Project configuration

Examples

Web Server Example

use systemd_reload::{run, is_worker};
use log::info;

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker process: run your web server
        start_web_server().await;
    } else {
        // Supervisor process: manage workers with 600 seconds timeout
        info!("Starting supervisor");
        run(600).await;
    }
}

async fn start_web_server() {
    info!("Web server started, PID: {}", std::process::id());
    
    // Example web server setup
    // let app = axum::Router::new().route("/", axum::routing::get(handler));
    // axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
    //     .serve(app.into_make_service())
    //     .await
    //     .unwrap();
    
    // Placeholder for demonstration
    loop {
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

Testing Worker Detection

use systemd_reload::is_worker;
use std::env;

fn main() {
    // Initially not a worker
    assert!(!is_worker());
    
    // Set environment variable
    unsafe {
        env::set_var("SYSTEMD_RELOAD_WORKER", "true");
    }
    assert!(is_worker());
    
    // Clean up
    unsafe {
        env::remove_var("SYSTEMD_RELOAD_WORKER");
    }
    assert!(!is_worker());
}

Historical Context

The concept of zero-downtime deployments has roots in telecommunications and mission-critical systems from the 1960s. The term "hot swapping" originated from the ability to replace hardware components without shutting down the system.

In software, this pattern gained prominence with:

1970s: IBM mainframes introduced dynamic program replacement
1990s: Erlang/OTP pioneered hot code swapping for telecom systems
2000s: Web servers like nginx popularized graceful reloads
2010s: Container orchestration (Kubernetes) made rolling updates mainstream

systemd, released in 2010, brought this pattern to Linux system services with socket activation and service reloading. This library brings similar capabilities to Rust applications, leveraging modern async programming patterns for high-performance process management.

The supervisor-worker pattern used here is inspired by Erlang's "let it crash" philosophy, where supervisors monitor and restart failed workers, ensuring system resilience through isolation and recovery rather than prevention.

About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

systemd_reload: Rust应用零停机进程管理

高性能Rust库，提供systemd风格的进程管理，支持零停机滚动更新、优雅关闭和可配置的超时处理。

功能特性

零停机滚动更新: 无缝替换运行中的进程，不中断服务
优雅关闭: 正确处理信号，可配置超时时间以确保进程干净退出
可配置超时: 为优雅关闭worker设置自定义超时时长
高性能: 使用crossfire异步消息队列，开销极小
信号管理: 处理SIGHUP重载、SIGTERM/SIGINT关闭
Worker模式检测: 防止进程生成时的无限递归
自动进程监控: 跟踪和管理worker进程生命周期

快速开始

添加到 Cargo.toml:

[dependencies]
systemd_reload = "0.1"

基本用法:

use systemd_reload::{run, is_worker};

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker进程：运行主要应用逻辑
        run_application().await;
    } else {
        // 监督者进程：管理worker，设置600秒超时
        run(600).await;
    }
}

async fn run_application() {
    // 实际的应用代码放在这里
    println!("Worker进程启动，PID: {}", std::process::id());
    
    // 示例：Web服务器、后台任务等
    loop {
        // 业务逻辑
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

API参考

核心类型

`Run`

主进程管理器，处理worker生命周期。

impl Run {
    pub fn new(timeout: u64) -> Self
    pub async fn run(mut self)
}

参数:

timeout: 强制终止worker前等待优雅关闭的最大秒数

`Exit`

表示进程退出信息。

pub struct Exit {
    pub pid: u32,
    pub code: Option<i32>,
}

函数

`run(timeout: u64) -> impl Future<Output = ()>`

便捷函数，一次调用创建并运行进程管理器。

参数:

timeout: 优雅关闭等待的最大秒数

示例:

// 等待最多600秒进行优雅关闭
run(600).await;

`is_worker() -> bool`

通过检查 SYSTEMD_RELOAD_WORKER 环境变量来检测当前进程是否运行在worker模式。

架构设计

库实现了监督者-工作者模式，使用异步消息传递：

graph TD
    A[监督者进程] --> B[信号处理器]
    A --> C[Worker管理器]
    A --> D[消息队列]
    
    B --> E[SIGHUP: 滚动更新]
    B --> F[SIGTERM/SIGINT: 关闭]
    
    C --> G[生成Worker]
    C --> H[监控Worker]
    C --> I[处理退出]
    
    G --> J[Worker进程1]
    G --> K[Worker进程2]
    
    H --> D
    I --> D
    
    D --> L[进程事件]
    L --> M[监督者退出]
    L --> N[优雅关闭]

处理流程

初始化: 监督者启动并生成初始worker进程
信号监控: 监听系统信号和worker退出事件
滚动更新: 收到SIGHUP时，生成新worker并优雅终止旧worker
Worker退出处理: 当worker退出时，从跟踪列表中移除；如果所有worker都死亡，监督者退出
优雅关闭: 收到SIGTERM/SIGINT时，向所有worker发送SIGTERM，等待优雅退出（最多等待timeout时长），如需要则使用SIGKILL强制终止

技术栈

异步运行时: Tokio提供async/await支持
消息队列: Crossfire提供高性能进程间通信
信号处理: signal-hook-tokio提供异步信号处理
进程管理: Tokio进程生成和监控
日志记录: 集成标准Rust log crate

项目结构

systemd_reload/
├── src/
│   ├── lib.rs          # 公共API和导出
│   └── run.rs          # 核心Run结构体实现
├── tests/
│   └── main.rs         # 集成测试
├── readme/
│   ├── en.md          # 英文文档
│   └── zh.md          # 中文文档
└── Cargo.toml         # 项目配置

使用示例

Web服务器示例

use systemd_reload::{run, is_worker};
use log::info;

#[tokio::main]
async fn main() {
    if is_worker() {
        // Worker进程：运行Web服务器
        start_web_server().await;
    } else {
        // 监督者进程：管理worker，设置600秒超时
        info!("启动监督者");
        run(600).await;
    }
}

async fn start_web_server() {
    info!("Web服务器启动，PID: {}", std::process::id());
    
    // Web服务器设置示例
    // let app = axum::Router::new().route("/", axum::routing::get(handler));
    // axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
    //     .serve(app.into_make_service())
    //     .await
    //     .unwrap();
    
    // 演示用占位符
    loop {
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
    }
}

测试Worker检测

use systemd_reload::is_worker;
use std::env;

fn main() {
    // 初始状态不是worker
    assert!(!is_worker());
    
    // 设置环境变量
    unsafe {
        env::set_var("SYSTEMD_RELOAD_WORKER", "true");
    }
    assert!(is_worker());
    
    // 清理
    unsafe {
        env::remove_var("SYSTEMD_RELOAD_WORKER");
    }
    assert!(!is_worker());
}

历史背景

零停机部署的概念起源于1960年代的电信和关键任务系统。"热插拔"一词最初指在不关闭系统的情况下更换硬件组件的能力。

在软件领域，这种模式的发展历程：

1970年代: IBM大型机引入动态程序替换
1990年代: Erlang/OTP为电信系统开创了热代码交换
2000年代: nginx等Web服务器普及了优雅重载
2010年代: 容器编排(Kubernetes)使滚动更新成为主流

systemd于2010年发布，通过socket激活和服务重载将这种模式带到Linux系统服务中。本库将类似功能引入Rust应用，利用现代异步编程模式实现高性能进程管理。

这里使用的监督者-工作者模式受到Erlang"让它崩溃"哲学的启发，监督者监控并重启失败的工作者，通过隔离和恢复而非预防来确保系统弹性。

关于

本项目为 js0.site ⋅ 重构互联网计划的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式，欢迎关注：

Commit count: 0

systemd_reload

documentation

README

systemd_reload: Zero-downtime Process Management for Rust Applications

Table of Contents

Features

Quick Start

API Reference

Core Types

Run

Exit

Functions

run(timeout: u64) -> impl Future<Output = ()>

is_worker() -> bool

Architecture

Process Flow

Technology Stack

Project Structure

Examples

Web Server Example

Testing Worker Detection

Historical Context

About

systemd_reload: Rust应用零停机进程管理

目录

功能特性

快速开始

API参考

核心类型

Run

Exit

函数

run(timeout: u64) -> impl Future<Output = ()>

is_worker() -> bool

架构设计

处理流程

技术栈

项目结构

使用示例

Web服务器示例

测试Worker检测

历史背景

关于

cargo fmt

`Run`

`Exit`

`run(timeout: u64) -> impl Future<Output = ()>`

`is_worker() -> bool`

`Run`

`Exit`

`run(timeout: u64) -> impl Future<Output = ()>`

`is_worker() -> bool`