| Crates.io | arrow-graph |
| lib.rs | arrow-graph |
| version | 0.6.2 |
| created_at | 2025-06-24 02:37:06.700035+00 |
| updated_at | 2025-06-24 22:17:01.309431+00 |
| description | Arrow-native graph processing engine with SQL interface |
| homepage | |
| repository | https://github.com/seacurity/arrow-graph |
| max_upload_size | |
| id | 1723818 |
| size | 1,022,540 |
A high-performance, Arrow-native graph analytics engine with SQL interface for modern data processing workflows.
Arrow-Graph brings graph analytics to the Apache Arrow ecosystem, providing:
Arrow-Graph represents graphs as Arrow RecordBatches, enabling zero-copy operations:
Edges Table (RecordBatch):
┌───────┬────────┬────────┐
│ src │ dst │ weight │
├───────┼────────┼────────┤
│ "A" │ "B" │ 1.0 │
│ "B" │ "C" │ 2.0 │
│ "C" │ "D" │ 1.5 │
│ "A" │ "D" │ 3.0 │
└───────┴────────┴────────┘
Nodes: Automatically indexed from unique src/dst values
Internal ID mapping: "A"→0, "B"→1, "C"→2, "D"→3
Add to your Cargo.toml:
[dependencies]
arrow-graph = "0.1"
use arrow_graph::prelude::*;
use arrow::array::{StringArray, Float64Array};
use arrow::record_batch::RecordBatch;
// Create a graph from Arrow data
let edges = RecordBatch::try_new(
schema,
vec![
Arc::new(StringArray::from(vec!["A", "B", "C"])), // source
Arc::new(StringArray::from(vec!["B", "C", "D"])), // target
Arc::new(Float64Array::from(vec![1.0, 2.0, 1.5])), // weight
],
)?;
let graph = ArrowGraph::from_edges(edges)?;
// Basic graph operations
println!("Nodes: {}", graph.node_count());
println!("Edges: {}", graph.edge_count());
println!("Density: {:.3}", graph.density());
// Navigate the graph
let neighbors = graph.neighbors("A").unwrap();
println!("A connects to: {:?}", neighbors);
Built as DataFusion User-Defined Functions (UDFs) for seamless integration:
-- Find shortest paths
SELECT shortest_path('A', 'D', 'edges_table') as path;
-- Calculate PageRank
SELECT node_id, pagerank() OVER (PARTITION BY graph_id) as rank
FROM nodes_table;
-- Detect communities
SELECT node_id, community_detection('leiden', 'edges_table') as cluster
FROM nodes_table;
Current prototype (basic graph metrics available):
// Register UDF with DataFusion
ctx.register_udf(create_udf(
"graph_density",
vec![DataType::Utf8], // table name
Arc::new(DataType::Float64),
Volatility::Stable,
Arc::new(|args| {
let edges = get_edges_table(args[0].as_ref())?;
Ok(ColumnarValue::Scalar(ScalarValue::Float64(
Some(calculate_density(&edges)?)
)))
}),
));
Built for modern data scales:
Comprehensive benchmarks coming in v0.2.0 release
Arrow-Graph is designed for modern graph analytics workflows:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ SQL Layer │ │ Algorithms │ │ Storage │
│ │ │ │ │ │
│ DataFusion UDFs │───▶│ SIMD Optimized │───▶│ Arrow Columnar │
│ Graph Functions │ │ Vectorized Ops │ │ Zero-Copy │
│ Pattern Matching│ │ Streaming Algos │ │ Memory Mapped │
└─────────────────┘ └──────────────────┘ └─────────────────┘
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Inspired by Apache Flink Gelly, built for the modern Arrow ecosystem.