peirene

Crates.iopeirene
lib.rspeirene
version0.1.1
created_at2025-11-07 14:13:27.606708+00
updated_at2025-11-07 14:15:16.628877+00
descriptionSecure LT codes for unidirectional error-correction, CDNs, blockchains and more
homepage
repositoryhttps://github.com/Silur/peirene.git
max_upload_size
id1921635
size175,633
(Silur)

documentation

README

logo

This crate provides a production-ready implementation of Luby Transform (LT) codes with secure verification—a rateless erasure coding framework applicable across distributed storage, reliable networking, and decentralized systems.

LT codes are rateless fountain codes that enable encoding arbitrary binary data into an unlimited stream of coded packets, any $k + O(\log k)$ of which can recover the original data. The SeF (Secure Fountain) extension adds cryptographic verification to detect and reject corrupted or maliciously-formed packets, making the scheme robust against adversarial environments.

Why LT/SeF?

LT/SeF codes excel in scenarios requiring:

  • Distributed Storage & Archival: Redundantly encode files across multiple unreliable nodes; recover data from any sufficient subset, tolerating node failures, packet loss, and Byzantine corruption
  • Reliable Multicast & Broadcasting: Broadcast large files to heterogeneous receivers over lossy channels; clients recover via any combination of received packets (no retransmission needed)
  • Peer-to-Peer Networks: Encode content for DHT/IPFS-like systems; peers reconstruct from any superset of received fragments
  • Wireless & Satellite Networks: Rateless transmission to mobile or intermittently-available nodes; adapts automatically to varying loss rates
  • Blockchain & Distributed Ledgers: Reduce full-node storage (100x–1000x compression) while enabling efficient bootstrap
  • Disaster Recovery & Censorship Resistance: Survive substantial data loss or targeted deletion; reconstruct from partial, scattered, or adversarial sources

Why LT/SeF Codes?

Challenge Traditional Solution LT/SeF Advantage
Storage Overhead Replication (3x+) Coded redundancy ($k/s$ ratio, tunable)
Network Efficiency ARQ (retransmit on loss) Rateless (stream codewords until success)
Computation Reed-Solomon (O(k²)) XOR-based (O(k log k))
Adversarial Environments Trust-all or trust-none Cryptographic verification per packet
Adaptive Environments Static rates Unlimited codeword generation

This Crate

This Rust implementation provides:

  • Robust Soliton Distribution – Optimal degree distribution balancing recovery probability and encoding efficiency

  • Error-Resilient Peeling Decoder – Detects and rejects corrupted packets via configurable verification

  • Precomputed Distributions – $O(\log k)$ sampling via binary search on CDF

  • Parallel Encoding – Optional Rayon-based parallelization for large epochs

  • Parallel Decoding - WIP

  • Flexible Verification – SHA-256 by default; custom hash functions supported

  • Comprehensive Error Handling – Rich error types for debugging and resilience

Usage

Basic Encode-Decode

use peirene::{Block, SeFEncoder, SeFDecoder, SolitonParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create blocks
    let k = 100; // Epoch size
    let blocks: Vec<Block> = (0..k)
        .map(|i| Block::new(format!("Block {}", i).into_bytes(), None))
        .collect();

    // 2. Configure fountain codes
    let params = SolitonParams::new(k, 0.1, 0.1); // c=0.1, δ=0.1
    let encoder = SeFEncoder::new(params);

    // 3. Encode: generate droplets
    // Note: s must satisfy s >= k + sqrt(k)*ln²(k/δ)
    let s = 10000; // ~100x storage saving
    let droplets = encoder.encode(&blocks, s)?;
    println!("Generated {} droplets from {} blocks", droplets.len(), k);

    // 4. Extract header chain for verification
    let header_chain: Vec<_> = blocks
        .iter()
        .map(|b| (b.header_hash.clone(), b.merkle_root.clone()))
        .collect();

    // 5. Decode: recover original blocks
    let decoder = SeFDecoder::new(k, header_chain);
    let recovered = decoder.decode(droplets)?;
    println!("Successfully recovered {} blocks", recovered.len());

    // 6. Verify
    for (orig, recov) in blocks.iter().zip(recovered.iter()) {
        assert_eq!(orig.data, recov.data);
    }
    println!("✓ All blocks verified");

    Ok(())
}

Custom Merkle Implementation

use sef_codes::{Block, SeFEncoder};

fn custom_hash(data: &[u8]) -> Vec<u8> {
    // Use Blake3 or any other hash function
    use blake3;
    blake3::hash(data).as_bytes().to_vec()
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let blocks: Vec<Block> = (0..100)
        .map(|i| {
            let data = format!("Block {}", i).into_bytes();
            // Pass custom hash function
            Block::new(data, Some(custom_hash))
        })
        .collect();

    // Rest of encoding/decoding...
    Ok(())
}

Parallel Encoding

use sef_codes::{SeFEncoder, SolitonParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let k = 10000;
    let blocks = vec![/* ... */];
    let s = 100000; // Large scale

    let params = SolitonParams::new(k, 0.1, 0.1);
    let encoder = SeFEncoder::new(params);

    // Parallel encoding (if feature enabled)
    let droplets = encoder.encode_parallel(&blocks, s)?;
    println!("Parallel-encoded {} droplets", droplets.len());

    Ok(())
}

Handling Adversarial Droplets

use sef_codes::{Block, SeFEncoder, SeFDecoder, SolitonParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let k = 100;
    let blocks: Vec<Block> = (0..k)
        .map(|i| Block::new(format!("Block {}", i).into_bytes(), None))
        .collect();

    let params = SolitonParams::new(k, 0.1, 0.1);
    let encoder = SeFEncoder::new(params);
    let mut droplets = encoder.encode(&blocks, 1000)?;

    // Simulate adversary corrupting some droplets
    for i in 0..50 {
        droplets[i].data = vec![0xFF; droplets[i].data.len()];
    }

    let header_chain: Vec<_> = blocks
        .iter()
        .map(|b| (b.header_hash.clone(), b.merkle_root.clone()))
        .collect();

    let decoder = SeFDecoder::new(k, header_chain);
    let recovered = decoder.decode(droplets)?;

    // Decoder automatically rejected murky droplets via Merkle verification
    println!("✓ Decoded successfully despite {} corrupted droplets", 50);
    assert_eq!(recovered.len(), k);

    Ok(())
}

How many droplets do I need? (how to set $s$)

The bound equation

To recover all $k$ original packets with high (tunable) probability (roughly $1 - \delta$), you need:

$$ m \approx k + \sqrt{k} \cdot \ln^2(k/\delta) $$

where:

  • $k$: Number of original packets (epoch size)
  • $m$: Minimum packets to receive/store for reliable recovery
  • $\delta$: Acceptable failure probability (e.g., 0.1 = 10% chance of failure)
  • $\ln^2(k/\delta)$: Natural logarithm, squared

What This Means in Plain English

For small $k$ (e.g., $k=100$):

  • The $\sqrt{k} \cdot \ln^2(k/\delta)$ term dominates
  • You need 3–5x more packets than original data
  • Example: 100 original packets → ~400–500 encoded packets minimum

For large $k$ (e.g., $k=10,000$):

  • The overhead term grows much slower than $k$
  • You need ~1.3–1.5x more packets
  • Example: 10,000 original packets → ~13,000–15,000 encoded packets minimum

Intuition: Small datasets need high redundancy; large datasets achieve near-optimal compression.

How to Estimate $s$ in Your Code

use std::f64;

fn estimate_min_buckets(k: usize, delta: f64) -> usize {
    let overhead = (k as f64).sqrt() * (k as f64 / delta).ln().powi(2);
    (k as f64 + overhead).ceil() as usize
}

// Examples:
let m1 = estimate_min_buckets(100, 0.1);     // ~380
let m2 = estimate_min_buckets(500, 0.1);     // ~2,122
let m3 = estimate_min_buckets(10000, 0.1);   // ~35,000

Practical Guidelines

Scenario $k$ $\delta$ Min $s$ Overhead Use Case
Small/test 10 0.1 ~60 6x Experimentation
Medium 100 0.1 ~380 3.8x Distributed storage
Large 1,000 0.1 ~4,000 4x Production systems
Very large 10,000 0.1 ~35,000 3.5x High-scale deployment
High reliability 1,000 0.01 ~8,000 8x Mission-critical

Key Insight

The overhead is NOT "slightly more than $k$" as textbooks or papers suggest. For practical sizes:

  • $k < 1,000$: Expect 3–6x overhead (must generate 3–6x more packets)
  • This is the price you pay for ratelessness and adversarial resilience
  • Compression savings come from storing a fraction of packets per node, not total network overhead

In Your Code

The encoder automatically validates this bound:

let params = SolitonParams::new(k, 0.1, 0.1);
let encoder = SeFEncoder::new(params);

// This will error if s is too small:
let droplets = encoder.encode(&blocks, s)?;
// Err: InsufficientDroplets { provided: 1000, required: 4000 }

References

"SeF: A Secure Fountain Architecture for Slashing Storage Costs in Blockchains"
Swanand Kadhe, Jichan Chung, and Kannan Ramchandran (UC Berkeley)
arXiv:1906.12140v1 [cs.CR]

LT Codes, M. Luby The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002.


Commit count: 0

cargo fmt