Crates.io | typed-arrow-unified |
lib.rs | typed-arrow-unified |
version | 0.0.1 |
created_at | 2025-08-21 07:48:02.187724+00 |
updated_at | 2025-09-18 15:24:59.682906+00 |
description | Unified facade (static/dynamic) built on typed-arrow + typed-arrow-dyn |
homepage | |
repository | https://github.com/tonbo-io/typed-arrow |
max_upload_size | |
id | 1804413 |
size | 32,306 |
typed-arrow provides a strongly typed, fully compile-time way to declare Arrow schemas in Rust. It maps Rust types directly to arrow-rs typed builders/arrays and arrow_schema::DataType
— without any runtime DataType
switching — enabling zero runtime cost, monomorphized column construction and ergonomic ORM-like APIs.
DataType
matching.arrow-array
/arrow-schema
types directly; no bespoke runtime layer to learn.use typed_arrow::{prelude::*, schema::SchemaMeta};
use typed_arrow::{Dictionary, TimestampTz, Millisecond, Utc, List};
#[derive(typed_arrow::Record)]
struct Address { city: String, zip: Option<i32> }
#[derive(typed_arrow::Record)]
struct Person {
id: i64,
address: Option<Address>,
tags: Option<List<Option<i32>>>, // List column with nullable items
code: Option<Dictionary<i32, String>>, // Dictionary<i32, Utf8>
joined: TimestampTz<Millisecond, Utc>, // Timestamp(ms) with timezone (UTC)
}
fn main() {
// Build from owned rows
let rows = vec![
Person {
id: 1,
address: Some(Address { city: "NYC".into(), zip: None }),
tags: Some(List::new(vec![Some(1), None, Some(3)])),
code: Some(Dictionary::new("gold".into())),
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_000_000),
},
Person {
id: 2,
address: None,
tags: None,
code: None,
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_100_000),
},
];
let mut b = <Person as BuildRows>::new_builders(rows.len());
b.append_rows(rows);
let arrays = b.finish();
// Compile-time schema + RecordBatch
let batch = arrays.into_record_batch();
assert_eq!(batch.schema().fields().len(), <Person as Record>::LEN);
println!("rows={}, field0={}", batch.num_rows(), batch.schema().field(0).name());
}
Add to your Cargo.toml
(derives enabled by default):
[dependencies]
typed-arrow = { version = "0.x" }
When working in this repository/workspace:
[dependencies]
typed-arrow = { path = "." }
Run the included examples to see end-to-end usage:
01_primitives
— derive Record
, inspect DataType
, build primitives02_lists
— List<T>
and List<Option<T>>
03_dictionary
— Dictionary<K, String>
04_timestamps
— Timestamp<U>
units04b_timestamps_tz
— TimestampTz<U, Z>
with Utc
and custom markers05_structs
— nested structs → StructArray
06_rows_flat
— row-based building for flat records07_rows_nested
— row-based building with nested struct fields08_record_batch
— compile-time schema + RecordBatch
09_duration_interval
— Duration and Interval types10_union
— Dense Union as a Record column (with attributes)11_map
— Map (incl. Option<V>
values) + as a Record column12_ext_hooks
— Extend #[derive(Record)]
with visitor injection and macro callbacksRun:
cargo run --example 08_record_batch
Record
: implemented by the derive macro for structs with named fields.ColAt<I>
: per-column associated items Rust
, ColumnBuilder
, ColumnArray
, NULLABLE
, NAME
, and data_type()
.ArrowBinding
: compile-time mapping from a Rust value type to its Arrow builder, array, and DataType
.BuildRows
: derive generates <Type>Builders
and <Type>Arrays
with append_row(s)
and finish
.SchemaMeta
: derive provides fields()
and schema()
; arrays structs provide into_record_batch()
.AppendStruct
and StructMeta
: enable nested struct fields and StructArray
building.#[schema_metadata(k = "owner", v = "data")]
.#[metadata(k = "pii", v = "email")]
.Struct
columns by default. Make the parent field nullable with Option<Nested>
; child nullability is independent.List<T>
(items non-null) and List<Option<T>>
(items nullable). Use Option<List<_>>
for list-level nulls.LargeList<T>
and LargeList<Option<T>>
for 64-bit offsets; wrap with Option<_>
for column nulls.FixedSizeList<T, N>
(items non-null) and FixedSizeListNullable<T, N>
(items nullable). Wrap with Option<_>
for list-level nulls.Map<K, V, const SORTED: bool = false>
where keys are non-null; use Map<K, Option<V>>
to allow nullable values. Column nullability via Option<Map<...>>
. SORTED
sets keys_sorted
in the Arrow DataType
.OrderedMap<K, V>
uses BTreeMap<K, V>
and declares keys_sorted = true
.Dictionary<K, V>
with integral keys K ∈ { i8, i16, i32, i64, u8, u16, u32, u64 }
and values:
String
/LargeUtf8
(Utf8/LargeUtf8)Vec<u8>
/LargeBinary
(Binary/LargeBinary)[u8; N]
(FixedSizeBinary)i*
, u*
, f32
, f64
Column nullability via Option<Dictionary<..>>
.Timestamp<U>
(unit-only) and TimestampTz<U, Z>
(unit + timezone). Units: Second
, Millisecond
, Microsecond
, Nanosecond
. Use Utc
or define your own Z: TimeZoneSpec
.Decimal128<P, S>
and Decimal256<P, S>
(precision P
, scale S
as const generics).#[derive(Union)]
for enums with #[union(mode = "dense"|"sparse")]
, per-variant #[union(tag = N)]
, #[union(field = "name")]
, and optional null carrier #[union(null)]
or container-level null_variant = "Var"
.Supported (arrow-rs v56):
[u8; N]
)Option<V>
for nullable values), OrderedMap (BTreeMap<K,V>) with keys_sorted = true
#[derive(Union)]
on enums)[u8; N]
), primitives (i*, u*, f32, f64)Missing:
#[record(visit(MyVisitor))]
#[record(field_macro = my_ext::per_field, record_macro = my_ext::per_record)]
#[record(ext(key))]
docs/extensibility.md
and the runnable example examples/12_ext_hooks.rs
.