Crates.io | minarrow |
lib.rs | minarrow |
version | 0.2.0 |
created_at | 2025-08-12 00:04:12.131235+00 |
updated_at | 2025-08-29 23:10:25.340713+00 |
description | Apache Arrow-compatible, Rust-first columnar data library for high-performance computing, native streaming, and embedded workloads. Minimal dependencies, ultra-low-latency access, automatic 64-byte SIMD alignment, and fast compile times. Great for real-time analytics, HPC pipelines, and systems integration. |
homepage | |
repository | https://github.com/pbower/minarrow |
max_upload_size | |
id | 1791221 |
size | 1,591,086 |
Welcome to Minarrow.
Minarrow is a from-scratch columnar library built for real-time and systems workloads in Rust.
It keeps the surface small, makes types explicit, compiles fast, and aligns data for predictable SIMD performance.
It speaks Arrow when you need to talk interchange — but the core stays lean.
Minarrow is the base layer of several related projects that expand on it to deliver a full set of SIMD-accelerated Kernels, Tokio streamable buffers, and a full-scale engine.
Minarrow compiles in under 1.5 seconds with default features, minimising development iteration time with < 0.15s rebuilds. This is achieved through minimal dependencies - primarily num-traits
with optional rayon
for parallelism.
Minarrow provides direct, always-typed access to array values. Unlike other Rust implementations that unify all array types as untyped byte buffers (requiring downcasting and dynamic checks), Minarrow retains concrete types throughout the API. This enables developers to inspect and manipulate data without downcasting or additional indirection, ensuring safe and ergonomic access at all times.
Six concrete array types cover common workloads:
Unified views:
And a single top-level Array for mixed tables.
The inner arrays match the Arrow IPC memory layout.
use std::sync::Arc;
use minarrow::{Array, IntegerArray, NumericArray, arr_bool, arr_cat32, arr_f64, arr_i32, arr_i64, arr_str32, vec64};
let int_arr = arr_i32![1, 2, 3, 4];
let float_arr = arr_f64![0.5, 1.5, 2.5];
let bool_arr = arr_bool![true, false, true];
let str_arr = arr_str32!["a", "b", "c"];
let cat_arr = arr_cat32!["x", "y", "x", "z"];
assert_eq!(int_arr.len(), 4);
assert_eq!(str_arr.len(), 3);
let int = IntegerArray::<i64>::from_slice(&[100, 200]);
let wrapped: NumericArray = NumericArray::Int64(Arc::new(int));
let array = Array::NumericArray(wrapped);
use minarrow::{FieldArray, Print, Table, arr_i32, arr_str32, vec64};
let col1 = FieldArray::from_inner("numbers", arr_i32![1, 2, 3]);
let col2 = FieldArray::from_inner("letters", arr_str32!["x", "y", "z"]);
let mut tbl = Table::new("Demo".into(), vec![col1, col2].into());
tbl.print();
See _examples/_ for more.
When working with arrays, remember to import the MaskedArray
trait,
which ensures all required methods are available.
Vec64
with a custom allocator.Lightstream-IO
crate provides IPC readers and writers that maintain this alignment, avoiding reallocation overhead during data ingestion.Minarrow uses enums for type dispatch instead of trait object downcasting, providing:
Any
or runtime downcastsmyarray.num().i64()
The structure is layered:
Array
enum – Arc-wrapped for zero-copy sharingNumericArray
– All numeric types in one variant setTextArray
– String and categorical dataTemporalArray
– All date/time variantsBooleanArray
– Boolean dataThis design supports flexible function signatures like impl Into<NumericArray>
while preserving static typing.
Because dispatch is static, the compiler retains full knowledge of types across calls, enabling inlining and eliminating virtual call overhead.
.to_apache_arrow()
..to_polars()
.Lightstream (planned Aug ’25) enables IPC streaming in Tokio async contexts with composable encoder/decoder traits, both sync and async, without losing SIMD alignment.
(&InnerArrayVariant, offset, len)
are available.Intel(R) Core(TM) Ultra 7 155H | x86_64 | 22 CPUs
Sum of 1,000 sequential integers starting at 0. Averaged over 1,000 runs (release).
(n=1000, lanes=4, iters=1000)
Case | Avg time |
---|---|
Integer (i64) | |
raw vec: Vec<i64> |
85 ns |
minarrow direct: IntegerArray |
88 ns |
arrow-rs struct: Int64Array |
147 ns |
minarrow enum: IntegerArray |
124 ns |
arrow-rs dyn: Int64Array |
181 ns |
Float (f64) | |
raw vec: Vec<f64> |
475 ns |
minarrow direct: FloatArray |
476 ns |
arrow-rs struct: Float64Array |
527 ns |
minarrow enum: FloatArray |
507 ns |
arrow-rs dyn: Float64Array |
1.952 µs |
(n=1000, lanes=4, iters=1000)
Case | Avg (ns) |
---|---|
raw vec: Vec<i64> |
64 |
raw vec64: Vec64<i64> |
55 |
minarrow direct: IntegerArray |
88 |
arrow-rs struct: Int64Array |
162 |
minarrow enum: IntegerArray |
170 |
arrow-rs dyn: Int64Array |
173 |
raw vec: Vec<f64> |
57 |
raw vec64: Vec64<f64> |
58 |
minarrow direct: FloatArray |
91 |
arrow-rs struct: Float64Array |
181 |
minarrow enum: FloatArray |
180 |
arrow-rs dyn: Float64Array |
196 |
Sum of 1 billion sequential integers starting at 0.
(n=1,000,000,000, lanes=4)
Case | Time (ms) |
---|---|
SIMD + Rayon IntegerArray<i64> |
113.874 |
SIMD + Rayon FloatArray<f64> |
114.095 |
Vec
Vec64
The construction delta is not included in the benchmark timings above.
Use Case | Description |
---|---|
Real-time Data Pipelines | Zero-copy interchange for streaming and event-driven systems |
Embedded and Edge Computing | Minimal deps, predictable memory layout, fast compile |
Systems-Level Integration | 64-byte alignment and FFI-friendly representation |
High-Performance Analytics | SIMD kernels and direct buffer access |
Rapid Prototyping and Development | Simple type system, intuitive APIs |
Data-Intensive Rust Applications | Rust-native data structures with no runtime abstraction penalty |
Extreme Latency Scenarios | Inner types for trading, defence, and other nanosecond-sensitive systems |
This approach trades some features (like deeply nested types) for a more streamlined experience in common data processing scenarios.
We welcome contributions! Areas of focus include:
Additionally, if you are interested in working on a SIMD kernels crate that's in development, and have relevant experience, please feel free to reach out.
Please see CONTRIBUTING.md for contributing guidelines.
This project is licensed under the MIT License
. See LICENSE for details.
Special thanks to the Apache Arrow
community and all contributors to the Arrow ecosystem. Special call out also to Arrow2
and Polars
. Minarrow is inspired by the consistently great work and standards driven by these projects.
We value community input and would appreciate your thoughts and feedback. Please don't hesitate to reach out with questions, suggestions, or contributions.