lightstream

Crates.iolightstream
lib.rslightstream
version0.1.0
created_at2025-09-04 00:29:35.980836+00
updated_at2025-09-04 00:29:35.980836+00
descriptionComposable, zero-copy Arrow IPC and native data streaming for Rust with SIMD-aligned I/O, async support, and memory-mapping.
homepage
repository
max_upload_size
id1823412
size1,511,842
PB (pbower)

documentation

README

Lightstream – Arrow IPC streaming without compromise

Why this crate exists

When building high-performance, full-stack, data-driven applications, I repeatedly hit the same barriers.

  • Protobuf/gRPC: convenient, but introduces memory copies.
  • FlightRPC: powerful, but heavyweight and overengineered for most cases.

I wanted a crate that would:

  1. Integrate Arrow IPC memory format seamlessly into any async context
  2. Provide low-level control
  3. Establish composable patterns for arbitrary wire formats
  4. Require no heavy infrastructure to get started
  5. Offer a true zero-copy path from wire -> SIMD kernels without re-allocation
  6. Maintain reasonable compile times

Lightstream is the result: an extension to Minarrow that adds native streaming and I/O. It enables raw bytestream construction, reading and writing IPC and TLV formats, CSV/Parquet writers, and a high-performance 64-byte SIMD memory-mapped IPC reader.

Use cases

  • Bypassing library machinery and going straight to the wire

  • Streaming Arrow IPC tables over network sockets with zero-copy buffers

  • Defining custom binary transport protocols

  • Zero-copy ingestion with mmap for ultra-fast analytics

  • Control data alignment at source - SIMD-aligned Arrow IPC writers/readers

  • Async data pipelines with backpressure-aware sinks and streams

  • Building custom data transport layers and transfer protocols

Introduction

Lightstream provides composable building blocks for high-performance data I/O in Rust:

  • Asynchronous Arrow IPC streaming and file writing
  • Framed decoders/sinks for IPC, TLV, CSV, and optional Parquet
  • Zero-copy, memory-mapped Arrow file reads (~4.5ms for 100M rows × 4 columns on a consumer laptop)
  • Direct integration with Tokio and futures using zero-copy buffers
  • 64-byte SIMD aligned readers/writers (the only Arrow-compatible crate providing this in 2025)

Design Principles

  • CustomisableYou own the buffer. Pull-based or sink-driven streaming.
  • Composable – Layerable codecs for encoders, decoders, sinks, and stream adapters.
  • Control – Wire-level framing: IPC, TLV, CSV, and Parquet are handled at the transport boundary.
  • Compatible – Native async support for futures and Tokio.
  • Power – 64-byte aligned memory via Vec64 ensures deterministic SIMD without hot-loop re-allocations.
  • Extensible – Primitive building blocks to implement custom wire formats. Contributions welcome.
  • Efficient – Minimal dependencies and fast compile times.

Layered Abstractions

Layer Provided by Lightstream Replaceable
Framing TlvFrame, IpcMessage
Buffering StreamBuffer
Encoding/Decoding FrameEncoder, FrameDecoder
Streaming GenByteStream, Sink
Formats IPC, Parquet, CSV, TLV

Each layer is trait-based with a reference implementation. Swap in your own framing, buffering, or encoding logic without re-implementing the stack.


Supported Formats

  • Arrow IPC – Full support for SIMD-aligned File and Stream protocols, schema + dictionaries, streaming or random access.
  • TLV – Minimal type-length-value framing for telemetry, control, or lightweight transport.
  • Parquet (feature-gated) – Compact, columnar, compression-aware writer (Zstd, Snappy) with minimal dependencies.
  • CSV – Streaming Arrow/Minarrow table readers/writers with headers, nulls, and custom delimiter/null handling.
  • Memory Maps – Ultra-fast, zero-copy ingestion: millions of rows in microseconds, SIMD-ready.

Examples

Framed Stream Reader


use lightstream::models::streams::framed_byte_stream::FramedByteStream; use lightstream::models::decoders::ipc::table_stream::TableStreamDecoder; use lightstream::models::readers::ipc::table_stream_reader::TableStreamReader;

let framed = FramedByteStream::new(socket, TableStreamDecoder::default()); let mut reader = TableStreamReader::new(framed);

while let Some(table) = reader.next_table().await? { println!("Received table: {:?}", table.name); }


Custom Protocol


pub struct MyFramer;

impl FrameDecoder for MyFramer { type Frame = Vec; fn decode(&mut self, buf: &[u8]) -> DecodeResultSelf::Frame { // Custom framing logic } }

let stream = FramedByteStream::new(socket, MyFramer);


Write Tables


use minarrow::{arr_i32, arr_str32, FieldArray, Table}; use lightstream::io::table_writer::TableWriter; use lightstream::enums::IPCMessageProtocol; use tokio::fs::File;

#[tokio::main] async fn main() -> std::io::Result<()> { let col1 = FieldArray::from_inner("numbers", arr_i32![1, 2, 3]); let col2 = FieldArray::from_inner("letters", arr_str32!["x", "y", "z"]); let table = Table::new("demo".into(), vec![col1, col2].into());

let file = File::create("demo.arrow").await?;
let schema = table.schema().to_vec();
let mut writer = TableWriter::new(file, schema, IPCMessageProtocol::File)?;

writer.write_table(table).await?;
writer.finish().await?;
Ok(())

}



Optional Features

  • parquet – Parquet writer
  • mmap – Memory-mapped files
  • zstd – Zstd compression (IPC + Parquet)
  • snappy – Snappy compression (IPC + Parquet)

Licence

This project is licensed under the MIT Licence. See the LICENCE file for full terms, and THIRD_PARTY_LICENSES for Apache-licensed dependencies.


Affiliation Notice

This project is not affiliated with Apache Arrow or the Apache Software Foundation.
It serialises the public Arrow format via a custom implementation (Minarrow), while reusing Flatbuffers schemas from Arrow-RS for schema type generation.

Commit count: 0

cargo fmt