| Crates.io | datafusion-dist |
| lib.rs | datafusion-dist |
| version | 0.3.0 |
| created_at | 2025-12-29 06:13:28.186434+00 |
| updated_at | 2026-01-23 13:18:06.970362+00 |
| description | A distributed streaming execution library for Apache DataFusion |
| homepage | https://github.com/systemxlabs/datafusion-dist |
| repository | https://github.com/systemxlabs/datafusion-dist |
| max_upload_size | |
| id | 2010028 |
| size | 142,056 |
A distributed streaming execution library for Apache DataFusion.
datafusion-dist enables distributed query execution for DataFusion, allowing you to scale analytical workloads across multiple nodes.

Consider a SQL query: SELECT * FROM t1 JOIN t2 ON t1.name = t2.name
CoalesceBatchesExec: target_batch_size=8192
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(name@0, name@0)]
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([name@0], 12), input_partitions=2
DataSourceExec: partitions=2, partition_sizes=[1, 1]
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([name@0], 12), input_partitions=2
DataSourceExec: partitions=2, partition_sizes=[1, 1]
===============Stage 0 (partitions=12)===============
CoalesceBatchesExec: target_batch_size=8192
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(name@0, name@0)]
ProxyExec: delegated_plan=CoalesceBatchesExec, delegated_stage=2
ProxyExec: delegated_plan=CoalesceBatchesExec, delegated_stage=1
===============Stage 1 (partitions=12)===============
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([name@0], 12), input_partitions=2
DataSourceExec: partitions=2, partition_sizes=[1, 1]
===============Stage 2 (partitions=12)===============
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([name@0], 12), input_partitions=2
DataSourceExec: partitions=2, partition_sizes=[1, 1]
0/{0,3,6,9},1/{0,1,2,3,4,5,6,7,8,9,10,11}->localhost:50060, 0/{1,4,7,10}->localhost:50070, 0/{2,5,8,11},2/{0,1,2,3,4,5,6,7,8,9,10,11}->localhost:50080
datafusion-dist/
├── dist/ # Core distributed execution library
├── clusters/
│ └── postgres/ # PostgreSQL-based cluster management
├── networks/
│ └── tonic/ # gRPC network layer using Tonic
└── integration-tests/ # Integration test suite
This project is licensed under the MIT License — see the LICENSE file for details.