# swim-rs

`swim-rs` is an implementation of the [SWIM protocol](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf) in Rust, designed for efficient and scalable cluster membership and failure detection in distributed systems.

This library is based on the official SWIM protocol paper and includes optimizations such as:

- **Piggybacking Instead of Multicasting**: Reduces network overhead by attaching membership updates to periodic messages rather than sending separate multicast messages.
- **Round-Robin Member Selection**: Ensures uniform probing of cluster members for health checks, improving detection accuracy.
- **Suspicion Mechanism**: Introduces an intermediate state between alive and dead, reducing false positives in failure detection.

## How It Works

`swim-rs` implements the SWIM protocol to manage cluster membership and detect node failures in a distributed system efficiently. At its core, each node periodically selects a random peer to send a PING message, verifying its health and availability. If the selected peer fails to respond within a specified timeframe, the node escalates the check by sending a PING_REQ to multiple other members of the cluster. This two-tiered approach minimizes false positives in failure detection by confirming suspicions through multiple independent confirmations.

To optimize network usage, swim-rs employs piggybacking, where membership updates and state changes are attached to regular messages (UDP) rather than sending separate multicast messages. Additionally, round-robin member selection ensures that all nodes are probed uniformly, preventing any single node from becoming a bottleneck in the failure detection process. The protocol also incorporates a suspicion mechanism, introducing an intermediate state between alive and dead, which helps in reducing the chances of incorrectly marking healthy nodes as failed.

Furthermore, swim-rs utilizes a gossip-based dissemination strategy to propagate membership information and state transitions across the cluster. This ensures that all nodes maintain a consistent and up-to-date view of the cluster's state, enhancing scalability and resilience.

## Features

- **Efficient Failure Detection**: Implements the SWIM protocol with optimizations for low network overhead and high accuracy.
- **Scalable Membership Management**: Manages cluster membership with support for dynamic node addition and removal.
- **Event-Driven Architecture**: Emits events for significant cluster changes, allowing applications to react accordingly.
- **Asynchronous Operations**: Built with Tokio for high-performance asynchronous networking.

## Installation

Add `swim-rs` to your `Cargo.toml`:

```toml
[dependencies]
swim-rs = "0.1.0"
```

## Basic Example

The following example demonstrates how to create two nodes in the same cluster and run the SWIM protocol:

```rust
use std::time::Duration;

use swim_rs::{
    api::{config::SwimConfig, swim::SwimCluster},
    Result,
};

#[tokio::main]
async fn main() -> Result<()> {
    // Creates two nodes in the same cluster
    let node1 = SwimCluster::try_new("127.0.0.1:8080", SwimConfig::new()).await?;
    let node2 = SwimCluster::try_new(
        "127.0.0.1:8081",
        SwimConfig::builder()
            .with_known_peers(["127.0.0.1:8080"])
            .build(),
    )
    .await?;

    // Run the SWIM protocol in the background
    node1.run().await;
    node2.run().await;

    // Simulate a long-running process or service
    tokio::time::sleep(Duration::from_secs(12)).await;

    Ok(())
}
```

## Event Subscription Example

The following example demonstrates how to subscribe to events emitted by nodes in the SWIM cluster.

```rust
use std::time::Duration;

use swim_rs::{
    api::{config::SwimConfig, swim::SwimCluster},
    Event, Result,
};

#[tokio::main]
async fn main() -> Result<()> {
    // Creates two nodes in the same cluster
    let node1 = SwimCluster::try_new("127.0.0.1:8080", SwimConfig::new()).await?;
    let node2 = SwimCluster::try_new(
        "127.0.0.1:8081",
        SwimConfig::builder()
            .with_known_peers(["127.0.0.1:8080"])
            .build(),
    )
    .await?;

    // Run the SWIM protocol in the background
    node1.run().await;
    node2.run().await;

    // Subscribe and receive events from node1
    let mut rx1 = node1.subscribe();

    // Handle events accordingly
    while let Ok(event) = rx1.recv().await {
       match event {
           Event::NodeJoined(e) => tracing::info!("[{}] handle {:#?}", node1.addr(), e),
           Event::NodeSuspected(e) => tracing::info!("[{}] handle {:#?}", node1.addr(), e),
           Event::NodeRecovered(e) => tracing::info!("[{}] handle {:#?}", node1.addr(), e),
           Event::NodeDeceased(e) => tracing::info!("[{}] handle {:#?}", node1.addr(), e),
       }
    }

    // Simulate a long-running process or service
    tokio::time::sleep(Duration::from_secs(12)).await;

    Ok(())
}
```

## Roadmap

The following features are planned for future releases to enhance the functionality, security, and robustness of swim-rs:

- Rate Limiting
- Authentication Checks
- Message Integrity Checks

## Contributing

Contributions are welcome!

Please open issues and submit pull requests for any enhancements or bug fixes.

1. Fork the repository.
2. Create a new branch: git checkout -b feature/YourFeature.
3. Commit your changes: git commit -m 'Add some feature'.
4. Push to the branch: git push origin feature/YourFeature.
5. Open a pull request.

## License

This project ist licensed under the Apache License, Version 2.0

## Learn More

- [SWIM Protocol Paper](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf)
- [GitHub Repository](https://github.com/marvinlanhenke/swim-rs)