rs3gw

Crates.iors3gw
lib.rsrs3gw
version0.1.0
created_at2025-12-06 04:31:52.558778+00
updated_at2026-01-04 04:11:31.437435+00
descriptionHigh-Performance AI/HPC Object Storage Gateway powered by scirs2-io
homepage
repositoryhttps://github.com/cool-japan/rs3gw
max_upload_size
id1969615
size3,220,568
KitaSan (cool-japan)

documentation

README

rs3gw

High-Performance Enterprise Object Storage Gateway

License Rust

rs3gw (Rust S3 Gateway) is an ultra-high-performance, enterprise-grade object storage gateway designed for AI/ML workloads, scientific computing (HPC), and large-scale data management. Built on Rust's zero-cost abstractions and powered by scirs2-io, it delivers S3-compatible access with predictable low latency, comprehensive observability, and advanced enterprise features.

๐Ÿš€ Key Features

Core Capabilities

  • S3-Compatible API: Drop-in replacement for AWS S3 with 100+ operations
  • Multiple API Protocols: REST, gRPC, GraphQL, and WebSocket streaming
  • Zero-GC Performance: Rust's memory safety delivers predictable, sub-millisecond latency
  • Edge Ready: Runs in containers as small as 50MB with minimal resource usage
  • Streaming I/O: Zero-copy streaming handles GB/TB files without memory bloat

Advanced Storage Features

  • Data Deduplication: Block-level deduplication with 30-70% storage savings
  • Smart Caching: ML-based predictive cache with pattern recognition
  • Transparent Compression: Automatic Zstd/LZ4 compression with intelligent compression ratios
  • Multi-Backend Support: Local, MinIO, AWS S3, GCS, Azure Blob backends
  • S3 Select: SQL queries on CSV, JSON, Parquet, Avro, ORC, Protobuf, MessagePack

Enterprise & Security

  • Advanced Encryption: AES-256-GCM, ChaCha20-Poly1305 with envelope encryption
  • ABAC: Attribute-Based Access Control with time windows and IP filtering
  • Audit Logging: Immutable audit trail with cryptographic chain verification
  • Compliance Reports: SOC2, HIPAA, GDPR automated reporting
  • Object Lock: GOVERNANCE and COMPLIANCE modes with retention policies

Observability & Performance

  • Distributed Tracing: OpenTelemetry integration with Jaeger/Tempo
  • Prometheus Metrics: 50+ metrics for monitoring and alerting
  • Anomaly Detection: Statistical analysis for performance anomalies
  • Auto-Scaling: Dynamic resource adaptation based on load
  • Continuous Profiling: CPU, memory, and I/O profiling with flamegraphs

High Availability

  • Multi-Node Cluster: Multi-leader architecture with automatic failover
  • Cross-Region Replication: WAN-optimized replication with conflict resolution
  • Self-Healing: Automatic corruption detection and repair
  • Backup & Recovery: Point-in-time recovery with incremental backups

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Clients: PyTorch/TensorFlow | boto3 | aws-cli | gRPC | GraphQL โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚ HTTP/REST, gRPC, GraphQL, WebSocket
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       rs3gw Gateway                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ REST API    โ”‚  โ”‚  gRPC API    โ”‚  โ”‚  GraphQL + WebSocket   โ”‚  โ”‚
โ”‚  โ”‚ (100+ ops)  โ”‚  โ”‚  (40+ ops)   โ”‚  โ”‚  (Realtime events)     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚         โ”‚                โ”‚                     โ”‚                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚              S3 Select Query Engine                        โ”‚  โ”‚
โ”‚  โ”‚   SQL on CSV/JSON/Parquet/Avro/ORC with Optimization      โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                             โ”‚                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚           Advanced Features Layer                          โ”‚  โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚ Dedup       โ”‚ โ”‚ ML Cache    โ”‚ โ”‚ Encryption/Compress  โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚ Zero-copy   โ”‚ โ”‚ ABAC        โ”‚ โ”‚ Audit/Compliance     โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                             โ”‚                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚        Multi-Backend Storage Abstraction                   โ”‚  โ”‚
โ”‚  โ”‚   Local | MinIO | AWS S3 | GCS | Azure | Ceph             โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        scirs2-io High-Performance Storage Engine                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Compression โ”‚  โ”‚ Format I/O  โ”‚  โ”‚ Async Buffer Management โ”‚   โ”‚
โ”‚  โ”‚ (Zstd/LZ4)  โ”‚  โ”‚ (Parquet)   โ”‚  โ”‚ (Direct I/O)            โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽฏ Quick Start

Prerequisites

  • Rust 1.85 or later
  • Linux, macOS, or Windows (WSL2)
  • (Optional) Docker and Docker Compose

Installation

# Clone the repository
git clone https://github.com/cool-japan/rs3gw.git
cd rs3gw

# Build release binary (optimized)
cargo build --release

# Run the server
./target/release/rs3gw

Docker Compose (Recommended for Development)

We provide a comprehensive development stack with monitoring:

# Start the full stack (rs3gw + Prometheus + Grafana + Jaeger + MinIO)
docker-compose -f docker-compose.dev.yml up -d

# Access services:
# - rs3gw S3 API: http://localhost:9000
# - Grafana Dashboard: http://localhost:3000 (admin/admin)
# - Prometheus: http://localhost:9091
# - Jaeger UI: http://localhost:16686
# - MinIO Console: http://localhost:9002 (minioadmin/minioadmin)

Configuration

rs3gw supports both TOML configuration files and environment variables:

  • TOML Configuration: Copy rs3gw.toml.example to rs3gw.toml and customize
  • Environment Variables: Copy .env.example to .env and customize
  • See TODO.md for the complete list of 50+ configuration options

Essential Configuration:

export RS3GW_BIND_ADDR="0.0.0.0:9000"
export RS3GW_STORAGE_ROOT="./data"
export RS3GW_ACCESS_KEY="minioadmin"
export RS3GW_SECRET_KEY="minioadmin"
export RS3GW_COMPRESSION="zstd:3"
export RS3GW_CACHE_ENABLED="true"
export RS3GW_DEDUP_ENABLED="true"

๐Ÿ“š Usage Examples

AWS CLI

# Configure endpoint
aws configure set default.s3.endpoint_url http://localhost:9000

# Create bucket and upload
aws s3 mb s3://my-bucket
aws s3 cp myfile.txt s3://my-bucket/

# S3 Select query (SQL on CSV/JSON/Parquet)
aws s3api select-object-content \
  --bucket my-bucket \
  --key data.csv \
  --expression "SELECT * FROM S3Object WHERE age > 30" \
  --expression-type SQL \
  --input-serialization '{"CSV": {"FileHeaderInfo": "USE"}}' \
  --output-serialization '{"CSV": {}}' \
  output.csv

Python (boto3)

import boto3

s3 = boto3.client(
    's3',
    endpoint_url='http://localhost:9000',
    aws_access_key_id='minioadmin',
    aws_secret_access_key='minioadmin',
    region_name='us-east-1'
)

# Basic operations
s3.create_bucket(Bucket='my-bucket')
s3.upload_file('local.txt', 'my-bucket', 'remote.txt')

# S3 Select
response = s3.select_object_content(
    Bucket='my-bucket',
    Key='data.csv',
    ExpressionType='SQL',
    Expression='SELECT name, age FROM S3Object WHERE age > 25',
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'CSV': {}}
)

# Multipart upload for large files
mpu = s3.create_multipart_upload(Bucket='my-bucket', Key='large.dat')
parts = []
for i, chunk in enumerate(read_chunks('large.dat', 5*1024*1024), 1):
    part = s3.upload_part(
        Bucket='my-bucket', Key='large.dat',
        PartNumber=i, UploadId=mpu['UploadId'],
        Body=chunk
    )
    parts.append({'PartNumber': i, 'ETag': part['ETag']})
s3.complete_multipart_upload(
    Bucket='my-bucket', Key='large.dat',
    UploadId=mpu['UploadId'],
    MultipartUpload={'Parts': parts}
)

gRPC (High-Performance Binary Protocol)

use rs3gw_proto::s3_service_client::S3ServiceClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = S3ServiceClient::connect("http://localhost:9000").await?;

    let request = tonic::Request::new(ListBucketsRequest {});
    let response = client.list_buckets(request).await?;

    for bucket in response.into_inner().buckets {
        println!("Bucket: {}", bucket.name);
    }

    Ok(())
}

GraphQL

query {
  buckets {
    name
    creationDate
    objectCount
    totalSize
  }

  searchObjects(query: "*.parquet", bucket: "my-bucket") {
    key
    size
    lastModified
  }
}

WebSocket (Real-Time Events)

const ws = new WebSocket('ws://localhost:9000/events/stream?bucket=my-bucket');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Event:', data.event_type, data.object_key);
};

Distributed Training API (AI/ML Workloads)

Manage machine learning training experiments, checkpoints, and hyperparameter searches:

# Create a training experiment
curl -X POST http://localhost:9000/api/training/experiments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-model-training",
    "description": "Training ResNet-50 on ImageNet",
    "tags": ["resnet", "imagenet"],
    "hyperparameters": {
      "learning_rate": 0.001,
      "batch_size": 32,
      "epochs": 100
    }
  }'

# Save a checkpoint
curl -X POST http://localhost:9000/api/training/experiments/{experiment_id}/checkpoints \
  -H "Content-Type: application/json" \
  -d '{
    "epoch": 10,
    "model_state": "base64_encoded_model_data",
    "optimizer_state": "base64_encoded_optimizer_data",
    "metrics": {
      "loss": 0.234,
      "accuracy": 0.892
    }
  }'

# Load a checkpoint
curl http://localhost:9000/api/training/checkpoints/{checkpoint_id}

# Log training metrics
curl -X POST http://localhost:9000/api/training/experiments/{experiment_id}/metrics \
  -H "Content-Type: application/json" \
  -d '{
    "step": 1000,
    "metrics": {
      "loss": 0.234,
      "accuracy": 0.892,
      "val_loss": 0.256,
      "val_accuracy": 0.875
    }
  }'

# Get experiment metrics
curl http://localhost:9000/api/training/experiments/{experiment_id}/metrics

# List checkpoints
curl http://localhost:9000/api/training/experiments/{experiment_id}/checkpoints

# Update experiment status
curl -X PUT http://localhost:9000/api/training/experiments/{experiment_id}/status \
  -H "Content-Type: application/json" \
  -d '{"status": "completed"}'

# Create hyperparameter search
curl -X POST http://localhost:9000/api/training/searches \
  -H "Content-Type: application/json" \
  -d '{
    "search_space": {
      "learning_rate": [0.0001, 0.001, 0.01],
      "batch_size": [16, 32, 64]
    },
    "optimization_metric": "val_accuracy"
  }'

# Add trial result to hyperparameter search
curl -X POST http://localhost:9000/api/training/searches/{search_id}/trials \
  -H "Content-Type: application/json" \
  -d '{
    "parameters": {
      "learning_rate": 0.001,
      "batch_size": 32
    },
    "metrics": {
      "val_accuracy": 0.892
    },
    "status": "completed"
  }'

Python example with requests:

import requests
import base64
import json

# Create experiment
response = requests.post('http://localhost:9000/api/training/experiments', json={
    'name': 'pytorch-training',
    'description': 'Training with PyTorch',
    'tags': ['pytorch', 'cnn'],
    'hyperparameters': {
        'lr': 0.001,
        'batch_size': 32
    }
})
experiment = response.json()['experiment']
exp_id = experiment['id']

# Save checkpoint during training
import torch

model_state = torch.save(model.state_dict())  # Your PyTorch model
model_bytes = pickle.dumps(model_state)
model_b64 = base64.b64encode(model_bytes).decode('utf-8')

requests.post(f'http://localhost:9000/api/training/experiments/{exp_id}/checkpoints', json={
    'epoch': 10,
    'model_state': model_b64,
    'metrics': {
        'loss': 0.234,
        'accuracy': 0.892
    }
})

# Log metrics every N steps
for step in range(1000):
    # ... training code ...
    if step % 100 == 0:
        requests.post(f'http://localhost:9000/api/training/experiments/{exp_id}/metrics', json={
            'step': step,
            'metrics': {
                'loss': current_loss,
                'accuracy': current_acc
            }
        })

๐Ÿ› ๏ธ Development Tools

Test Data Generator

Generate test datasets for benchmarking and testing:

# Generate a medium-sized mixed dataset
cargo run --bin testdata-generator -- dataset \
  --output ./testdata \
  --size medium

# Generate specific file types
cargo run --bin testdata-generator -- parquet \
  --output ./parquet-data \
  --count 10 \
  --rows 100000

S3 Migration Tool

Migrate data between S3-compatible systems:

# Copy all objects from MinIO to rs3gw
cargo run --bin s3-migrate -- copy \
  --source-endpoint http://minio:9000 \
  --source-access-key minioadmin \
  --source-secret-key minioadmin \
  --source-bucket source-bucket \
  --dest-endpoint http://localhost:9000 \
  --dest-access-key minioadmin \
  --dest-secret-key minioadmin \
  --dest-bucket dest-bucket \
  --concurrency 20

# Incremental sync with verification
cargo run --bin s3-migrate -- sync \
  --source-endpoint http://minio:9000 \
  --source-access-key minioadmin \
  --source-secret-key minioadmin \
  --source-bucket source-bucket \
  --dest-endpoint http://localhost:9000 \
  --dest-access-key minioadmin \
  --dest-secret-key minioadmin \
  --dest-bucket dest-bucket \
  --delete

# Verify data integrity
cargo run --bin s3-migrate -- verify \
  --source-endpoint http://minio:9000 \
  --source-access-key minioadmin \
  --source-secret-key minioadmin \
  --source-bucket source-bucket \
  --dest-endpoint http://localhost:9000 \
  --dest-access-key minioadmin \
  --dest-secret-key minioadmin \
  --dest-bucket dest-bucket

๐Ÿ“Š Supported S3 Operations

Bucket Operations (26 operations)

  • โœ… ListBuckets, CreateBucket, DeleteBucket, HeadBucket
  • โœ… GetBucketLocation, GetBucketVersioning, PutBucketVersioning
  • โœ… GetBucketTagging, PutBucketTagging, DeleteBucketTagging
  • โœ… GetBucketPolicy, PutBucketPolicy, DeleteBucketPolicy
  • โœ… GetBucketCors, PutBucketCors, DeleteBucketCors
  • โœ… GetBucketEncryption, PutBucketEncryption, DeleteBucketEncryption
  • โœ… GetBucketLifecycleConfiguration, PutBucketLifecycleConfiguration
  • โœ… GetBucketReplication, PutBucketReplication
  • โœ… GetBucketNotificationConfiguration, PutBucketNotificationConfiguration
  • โœ… GetPublicAccessBlock, PutPublicAccessBlock

Object Operations (40+ operations)

  • โœ… ListObjectsV1, ListObjectsV2, ListObjectVersions
  • โœ… GetObject, PutObject, DeleteObject, DeleteObjects
  • โœ… HeadObject, CopyObject, GetObjectAttributes
  • โœ… GetObjectTagging, PutObjectTagging, DeleteObjectTagging
  • โœ… GetObjectAcl, PutObjectAcl
  • โœ… PostObject (browser upload)
  • โœ… SelectObjectContent (S3 Select with SQL)
  • โœ… Range requests, Conditional headers
  • โœ… Object Lock (GetObjectRetention, PutObjectRetention, GetObjectLegalHold, PutObjectLegalHold)

Multipart Upload (7 operations)

  • โœ… CreateMultipartUpload
  • โœ… UploadPart, UploadPartCopy
  • โœ… CompleteMultipartUpload
  • โœ… AbortMultipartUpload
  • โœ… ListParts, ListMultipartUploads

Advanced Features

  • โœ… S3 Select: SQL queries on CSV, JSON, Parquet, Avro, ORC, Protobuf, MessagePack
    • Aggregations: SUM, AVG, COUNT, MIN, MAX
    • GROUP BY, ORDER BY, LIMIT
    • Column pruning and predicate pushdown for Parquet
    • Query plan caching
  • โœ… Presigned URLs: Temporary access URLs with expiration
  • โœ… Server-Side Encryption: SSE-S3, SSE-C with AES-256-GCM
  • โœ… Checksums: CRC32C, CRC32, SHA256, SHA1, MD5 validation

๐Ÿ”ง Advanced Configuration

Performance Tuning

# Data Deduplication (30-70% storage savings)
export RS3GW_DEDUP_ENABLED=true
export RS3GW_DEDUP_BLOCK_SIZE=65536
export RS3GW_DEDUP_ALGORITHM=content-defined

# Zero-Copy Optimizations
export RS3GW_ZEROCOPY_DIRECT_IO=true
export RS3GW_ZEROCOPY_SPLICE=true
export RS3GW_ZEROCOPY_MMAP=true

# Smart ML-based Caching
export RS3GW_CACHE_ENABLED=true
export RS3GW_CACHE_MAX_SIZE_MB=512
export RS3GW_CACHE_TTL=300

Security Configuration

# Encryption
export RS3GW_ENCRYPTION_ENABLED=true
export RS3GW_ENCRYPTION_ALGORITHM=aes256gcm

# Audit Logging
export RS3GW_AUDIT_ENABLED=true
export RS3GW_AUDIT_LOG_PATH=/var/log/rs3gw/audit.log

# ABAC (Attribute-Based Access Control)
export RS3GW_ABAC_ENABLED=true

Cluster Configuration

# Multi-node cluster with replication
export RS3GW_CLUSTER_ENABLED=true
export RS3GW_CLUSTER_NODE_ID=node1
export RS3GW_CLUSTER_ADVERTISE_ADDR=10.0.0.1:9001
export RS3GW_CLUSTER_SEED_NODES=10.0.0.2:9001,10.0.0.3:9001
export RS3GW_REPLICATION_MODE=quorum
export RS3GW_REPLICATION_FACTOR=3

Observability

# OpenTelemetry distributed tracing
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1

# Profiling
export RS3GW_PROFILING_ENABLED=true
export RS3GW_PROFILING_INTERVAL_SECS=60

๐ŸŽจ Object Transformations

rs3gw provides powerful server-side object transformation capabilities with extensible plugin support.

Supported Transformations

Type Feature Flag Status Use Cases
Image Processing default โœ… Production Resize, crop, format conversion
Compression default โœ… Production Zstd, Gzip, LZ4
Video Transcoding video-transcoding โœ… Production Multi-codec video conversion
WASM Plugins wasm-plugins โœ… Production Custom extensible transformations

Image Processing

// Resize and convert to WebP
use rs3gw::storage::transformations::{TransformationType, ImageTransformParams};

let transform = TransformationType::Image {
    params: ImageTransformParams {
        width: Some(800),
        height: None,  // Maintains aspect ratio
        format: Some(ImageFormat::Webp),
        quality: Some(85),
        maintain_aspect_ratio: true,
        crop_mode: None,
    }
};

Features:

  • Multiple resize modes (exact, fit, crop, by-width, by-height)
  • Format conversion (JPEG, PNG, WebP, GIF, BMP, TIFF)
  • Quality control for lossy formats
  • Lanczos3 filtering for high-quality output

Video Transcoding

Requires: video-transcoding feature flag

# Build with video transcoding support
cargo build --features video-transcoding
// Transcode to H.264
let transform = TransformationType::Video {
    params: VideoTransformParams {
        codec: VideoCodec::H264,
        bitrate: Some(2000),  // 2000 kbps
        fps: Some(30),
        width: Some(1920),
        height: Some(1080),
        audio_codec: Some("aac".to_string()),
        audio_bitrate: Some(128),
    }
};

Supported Codecs: H.264, H.265/HEVC, VP8, VP9, AV1

WASM Plugins

Requires: wasm-plugins feature flag

# Build with WASM plugin support
cargo build --features wasm-plugins

Create custom transformations in WebAssembly:

// Register and use custom plugin
let transformer = WasmPluginTransformer::new();
let wasm_binary = std::fs::read("plugins/my-plugin.wasm")?;
transformer.register_plugin("my-plugin".to_string(), wasm_binary).await?;

let transform = TransformationType::WasmPlugin {
    plugin_name: "my-plugin".to_string(),
    params: HashMap::new(),
};

Documentation:

Build with All Features

# Build with all optional features enabled
cargo build --all-features --release

# Available features:
# - io_uring: Linux io_uring support (Linux only)
# - video-transcoding: FFmpeg-based video transcoding (requires FFmpeg)
# - wasm-plugins: WebAssembly plugin system (Pure Rust)

๐Ÿ“ˆ Performance

rs3gw delivers exceptional performance through Rust's zero-cost abstractions:

Benchmarks

Run comprehensive benchmarks:

# Storage operations
cargo bench --bench storage_benchmarks

# S3 API operations
cargo bench --bench s3_api_benchmarks

# Load testing
cargo bench --bench load_testing_benchmarks

# Compression
cargo bench --bench compression_benchmarks

Key Performance Features

  • Zero-GC: No garbage collection pauses, predictable sub-millisecond latency
  • Zero-Copy: Streaming large files without memory bloat
  • Deduplication: 30-70% storage savings with content-defined chunking
  • ML Cache: Predictive prefetching improves hit rates by 20-40%
  • Query Optimization: Parquet column pruning reduces I/O by 50-80%
  • Direct I/O: Kernel bypass for large objects (>1MB)

๐Ÿงช Testing

# Run all 392 tests (305 unit + 67 integration)
cargo nextest run --all-features

# Run integration tests only
cargo test --test '*'

# Run with code coverage
cargo tarpaulin --all-features --out Html

# Run specific test suite
cargo test --test grpc_tests

# Run benchmarks
cargo bench

๐Ÿ“– Documentation

Guides

Module Documentation

Configuration Files

  • rs3gw.toml.example - TOML configuration template
  • .env.example - Environment variable template

๐Ÿข Production Deployment

๐Ÿ“˜ See the Production Deployment Guide for comprehensive deployment instructions.

Quick Start: Kubernetes

# Deploy with Kustomize
kubectl apply -k k8s/overlays/production/

# Or with Helm
helm install rs3gw k8s/helm/rs3gw/ \
  --set replicaCount=3 \
  --set persistence.size=500Gi

Monitoring

Access the Grafana dashboard (included in docker-compose.dev.yml):

  • URL: http://localhost:3000
  • Default credentials: admin/admin
  • Pre-configured dashboards for:
    • Request rates and latency percentiles
    • Storage usage and object counts
    • Cache hit rates
    • Error rates by operation

๐Ÿ”ฌ SCIRS2 Policy Compliance

Rs3gw is fully compliant with the SCIRS2 (Scientific Rust) ecosystem policies. This ensures high-quality, reproducible, and scientifically sound code.

Key Compliance Areas

  • โœ… Pure Rust: 100% Pure Rust in default features (C dependencies feature-gated)
  • โœ… No Warnings: Zero compiler and clippy warnings enforced
  • โœ… No Unwrap: All errors properly handled with Result types
  • โœ… SciRS2 Integration: Uses scirs2-core for RNG and scirs2-io for storage
  • โœ… Workspace Structure: Proper Cargo workspace with shared dependencies
  • โœ… File Size Limits: All files under 2,000 lines (largest: 1,828 lines)
  • โœ… Latest Crates: Dependencies kept up-to-date with crates.io
  • โœ… Code Formatting: cargo fmt enforced on all code

Random Number Generation

Rs3gw uses scirs2-core::random instead of the standard rand crate for:

  • Better reproducibility in scientific contexts
  • Integration with SciRS2 statistical libraries
  • Consistent behavior across the ecosystem

Verification

Verify policy compliance:

# Run all policy checks
./scripts/verify_policies.sh

# Individual checks
cargo build --all-features  # No warnings
cargo clippy --all-targets  # No clippy warnings
cargo nextest run           # All tests pass (550/550)

For detailed policy information, see SCIRS2_POLICY.md.

๐Ÿค Contributing

We welcome contributions! Please see our development process:

  1. Fork the repository
  2. Create a feature branch
  3. Run tests: cargo nextest run --all-features
  4. Run clippy: cargo clippy --all-features
  5. Ensure no unwrap() in production code
  6. Keep files under 2000 lines (use splitrs if needed)
  7. Submit a pull request

๐Ÿ“Š Project Statistics

  • Language: Rust (100% Pure Rust default features)
  • Lines of Code: ~52,559 code lines (63,662 total including comments and blanks)
  • Test Coverage: 550 comprehensive tests (100% passing)
  • Modules: 134 Rust files
  • Dependencies: Carefully selected for performance and security (all up-to-date)
  • Policy Compliance: 100% SCIRS2 compliant

๐Ÿ“œ License

This project is dual-licensed under:

Choose the license that best fits your use case.

๐Ÿ™ Acknowledgments

๐Ÿ”— Links


Built with โค๏ธ in Rust for performance-critical workloads

Commit count: 0

cargo fmt